Podcast Episodes
Back to SearchClaw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
Episode 1743
🤗 Upvotes: 98 | cs.AI
Authors:
Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, C…
3Â weeks, 6Â days ago
Learning to Retrieve from Agent Trajectories
Episode 1742
🤗 Upvotes: 55 | cs.IR, cs.AI, cs.CL
Authors:
Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen
3Â weeks, 6Â days ago
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation
Episode 1741
🤗 Upvotes: 47 | cs.LG
Authors:
Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li
3Â weeks, 6Â days ago
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
Episode 1740
🤗 Upvotes: 37 | cs.SE, cs.AI
Authors:
Shufan Jiang, Chios Chen, Zhiyang Chen
Title:
GBQA…
3Â weeks, 6Â days ago
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning
Episode 1739
🤗 Upvotes: 33 | cs.PF, cs.SE
Authors:
Qisheng Su, Shiting Huang, Zhen Fang, Ziyan Chen, Zehui Chen, Feng Zhao
3Â weeks, 6Â days ago
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement
Episode 1738
🤗 Upvotes: 32 | cs.AI
Authors:
Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson
T…
3Â weeks, 6Â days ago
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
Episode 1737
🤗 Upvotes: 31 | cs.CV
Authors:
Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo
Title:
…
3Â weeks, 6Â days ago
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
Episode 1736
🤗 Upvotes: 26 | cs.CL, cs.DC, cs.OS
Authors:
Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye
Titl…
3Â weeks, 6Â days ago
Watch Before You Answer: Learning from Visually Grounded Post-Training
Episode 1735
🤗 Upvotes: 26 | cs.CV, cs.AI, cs.CL
Authors:
Yuxuan Zhang, EunJeong Hwang, Huaisong Zhang, Penghui Du, Yiming Ji…
3Â weeks, 6Â days ago
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
Episode 1734
🤗 Upvotes: 152 | cs.CV
Authors:
DataFlow Team, Bohan Zeng, Daili Hua, Kaixin Zhu, Yifan Dai, Bozhou Li, Yuran Wa…
4Â weeks ago