Podcast Episodes

Back to Search
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Episode 1743

🤗 Upvotes: 98 | cs.AI

Authors:
Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, C…

3 weeks, 6 days ago

Short Long
View Episode
Learning to Retrieve from Agent Trajectories

Episode 1742

🤗 Upvotes: 55 | cs.IR, cs.AI, cs.CL

Authors:
Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen

…

3 weeks, 6 days ago

Short Long
View Episode
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

Episode 1741

🤗 Upvotes: 47 | cs.LG

Authors:
Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li

…

3 weeks, 6 days ago

Short Long
View Episode
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

Episode 1740

🤗 Upvotes: 37 | cs.SE, cs.AI

Authors:
Shufan Jiang, Chios Chen, Zhiyang Chen

Title:
GBQA…

3 weeks, 6 days ago

Short Long
View Episode
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

Episode 1739

🤗 Upvotes: 33 | cs.PF, cs.SE

Authors:
Qisheng Su, Shiting Huang, Zhen Fang, Ziyan Chen, Zehui Chen, Feng Zhao

…

3 weeks, 6 days ago

Short Long
View Episode
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Episode 1738

🤗 Upvotes: 32 | cs.AI

Authors:
Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson

T…

3 weeks, 6 days ago

Short Long
View Episode
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Episode 1737

🤗 Upvotes: 31 | cs.CV

Authors:
Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo

Title:
…

3 weeks, 6 days ago

Short Long
View Episode
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Episode 1736

🤗 Upvotes: 26 | cs.CL, cs.DC, cs.OS

Authors:
Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye

Titl…

3 weeks, 6 days ago

Short Long
View Episode
Watch Before You Answer: Learning from Visually Grounded Post-Training

Episode 1735

🤗 Upvotes: 26 | cs.CV, cs.AI, cs.CL

Authors:
Yuxuan Zhang, EunJeong Hwang, Huaisong Zhang, Penghui Du, Yiming Ji…

3 weeks, 6 days ago

Short Long
View Episode
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Episode 1734

🤗 Upvotes: 152 | cs.CV

Authors:
DataFlow Team, Bohan Zeng, Daili Hua, Kaixin Zhu, Yifan Dai, Bozhou Li, Yuran Wa…

4 weeks ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us