Podcast Episodes

Back to Search

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Episode 1776

🤗 Upvotes: 106 | cs.CV, cs.AI, cs.HC

Authors:
Mingyu Ouyang, Siyuan Hu, Kevin Qinghong Lin, Hwee Tou Ng, Mike Zh…

2 months, 2 weeks ago

Short Long

View Episode

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Episode 1775

🤗 Upvotes: 96 | cs.AI, cs.LG

Authors:
Haozhe Wang, Cong Wei, Weiming Ren, Jiaming Liu, Fangzhen Lin, Wenhu Chen

…

2 months, 2 weeks ago

Short Long

View Episode

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

Episode 1774

🤗 Upvotes: 60 | cs.CV, cs.CL

Authors:
Dinging Li, Yingxiu Zhao, Xinrui Cheng, Kangheng Lin, Hongbo Peng, Hongxin…

2 months, 2 weeks ago

Short Long

View Episode

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Episode 1773

🤗 Upvotes: 50 | cs.CL

Authors:
Xiaomeng Hu, Yinger Zhang, Fei Huang, Jianhong Tu, Yang Su, Lianghao Deng, Yuxuan…

2 months, 2 weeks ago

Short Long

View Episode

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Episode 1772

🤗 Upvotes: 24 | cs.AI, cs.CL

Authors:
Kangsan Kim, Minki Kang, Taeil Kim, Yanlai Yang, Mengye Ren, Sung Ju Hwang…

2 months, 2 weeks ago

Short Long

View Episode

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

Episode 1771

🤗 Upvotes: 23 | cs.LG, cs.AI, cs.CL

Authors:
Yuqiao Tan, Minzheng Wang, Bo Liu, Zichen Liu, Tian Liang, Shizhu H…

2 months, 2 weeks ago

Short Long

View Episode

Exploration and Exploitation Errors Are Measurable for Language Model Agents

Episode 1770

🤗 Upvotes: 22 | cs.AI

Authors:
Jaden Park, Jungtaek Kim, Jongwon Jeong, Robert D. Nowak, Kangwook Lee, Yong Jae …

2 months, 2 weeks ago

Short Long

View Episode

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Episode 1769

🤗 Upvotes: 123 | cs.LG, cs.AI, cs.CL, cs.CV

Authors:
Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, …

2 months, 2 weeks ago

Short Long

View Episode

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Episode 1768

🤗 Upvotes: 62 | cs.LG, cs.AI, cs.CL

Authors:
Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Ch…

2 months, 2 weeks ago

Short Long

View Episode

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Episode 1767

🤗 Upvotes: 27 | cs.AI, cs.LG

Authors:
Jiachen Zhu, Lingyu Yang, Rong Shan, Congmin Zheng, Zeyu Zheng, Weiwen Liu…

2 months, 2 weeks ago

Short Long

View Episode

Podcast Episodes

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

Exploration and Exploitation Errors Are Measurable for Language Model Agents

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Love PodBriefly?