Podcast Episodes
Back to SearchGameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Episode 1776
🤗 Upvotes: 106 | cs.CV, cs.AI, cs.HC
Authors:
Mingyu Ouyang, Siyuan Hu, Kevin Qinghong Lin, Hwee Tou Ng, Mike Zh…
2Â months, 2Â weeks ago
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Episode 1775
🤗 Upvotes: 96 | cs.AI, cs.LG
Authors:
Haozhe Wang, Cong Wei, Weiming Ren, Jiaming Liu, Fangzhen Lin, Wenhu Chen
2Â months, 2Â weeks ago
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
Episode 1774
🤗 Upvotes: 60 | cs.CV, cs.CL
Authors:
Dinging Li, Yingxiu Zhao, Xinrui Cheng, Kangheng Lin, Hongbo Peng, Hongxin…
2Â months, 2Â weeks ago
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation
Episode 1773
🤗 Upvotes: 50 | cs.CL
Authors:
Xiaomeng Hu, Yinger Zhang, Fei Huang, Jianhong Tu, Yang Su, Lianghao Deng, Yuxuan…
2Â months, 2Â weeks ago
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
Episode 1772
🤗 Upvotes: 24 | cs.AI, cs.CL
Authors:
Kangsan Kim, Minki Kang, Taeil Kim, Yanlai Yang, Mengye Ren, Sung Ju Hwang…
2Â months, 2Â weeks ago
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space
Episode 1771
🤗 Upvotes: 23 | cs.LG, cs.AI, cs.CL
Authors:
Yuqiao Tan, Minzheng Wang, Bo Liu, Zichen Liu, Tian Liang, Shizhu H…
2Â months, 2Â weeks ago
Exploration and Exploitation Errors Are Measurable for Language Model Agents
Episode 1770
🤗 Upvotes: 22 | cs.AI
Authors:
Jaden Park, Jungtaek Kim, Jongwon Jeong, Robert D. Nowak, Kangwook Lee, Yong Jae …
2Â months, 2Â weeks ago
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Episode 1769
🤗 Upvotes: 123 | cs.LG, cs.AI, cs.CL, cs.CV
Authors:
Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, …
2Â months, 2Â weeks ago
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Episode 1768
🤗 Upvotes: 62 | cs.LG, cs.AI, cs.CL
Authors:
Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Ch…
2Â months, 2Â weeks ago
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Episode 1767
🤗 Upvotes: 27 | cs.AI, cs.LG
Authors:
Jiachen Zhu, Lingyu Yang, Rong Shan, Congmin Zheng, Zeyu Zheng, Weiwen Liu…
2Â months, 2Â weeks ago