Podcast Episodes

Back to Search
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Episode 1773

🤗 Upvotes: 50 | cs.CL

Authors:
Xiaomeng Hu, Yinger Zhang, Fei Huang, Jianhong Tu, Yang Su, Lianghao Deng, Yuxuan…

2 weeks, 5 days ago

Short Long
View Episode
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Episode 1772

🤗 Upvotes: 24 | cs.AI, cs.CL

Authors:
Kangsan Kim, Minki Kang, Taeil Kim, Yanlai Yang, Mengye Ren, Sung Ju Hwang…

2 weeks, 5 days ago

Short Long
View Episode
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

Episode 1771

🤗 Upvotes: 23 | cs.LG, cs.AI, cs.CL

Authors:
Yuqiao Tan, Minzheng Wang, Bo Liu, Zichen Liu, Tian Liang, Shizhu H…

2 weeks, 5 days ago

Short Long
View Episode
Exploration and Exploitation Errors Are Measurable for Language Model Agents

Episode 1770

🤗 Upvotes: 22 | cs.AI

Authors:
Jaden Park, Jungtaek Kim, Jongwon Jeong, Robert D. Nowak, Kangwook Lee, Yong Jae …

2 weeks, 5 days ago

Short Long
View Episode
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Episode 1769

🤗 Upvotes: 123 | cs.LG, cs.AI, cs.CL, cs.CV

Authors:
Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, …

2 weeks, 6 days ago

Short Long
View Episode
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Episode 1768

🤗 Upvotes: 62 | cs.LG, cs.AI, cs.CL

Authors:
Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Ch…

2 weeks, 6 days ago

Short Long
View Episode
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Episode 1767

🤗 Upvotes: 27 | cs.AI, cs.LG

Authors:
Jiachen Zhu, Lingyu Yang, Rong Shan, Congmin Zheng, Zeyu Zheng, Weiwen Liu…

2 weeks, 6 days ago

Short Long
View Episode
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Episode 1766

🤗 Upvotes: 26 | cs.AI

Authors:
Tianyi Wang, Yixia Li, Long Li, Yibiao Chen, Shaohan Huang, Yun Chen, Peng Li, Ya…

2 weeks, 6 days ago

Short Long
View Episode
Toward Autonomous Long-Horizon Engineering for ML Research

Episode 1765

🤗 Upvotes: 25 | cs.CL

Authors:
Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua …

2 weeks, 6 days ago

Short Long
View Episode
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

Episode 1764

🤗 Upvotes: 21 | cs.CL, cs.AI

Authors:
Hippolyte Gisserot-Boukhlef, Nicolas Boizard, Emmanuel Malherbe, Céline Hu…

2 weeks, 6 days ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us