Podcast Episodes

Back to Search

MARS: Enabling Autoregressive Models Multi-Token Generation

Episode 1746

🤗 Upvotes: 25 | cs.CL

Authors:
Ziqi Jin, Lei Wang, Ziwei Luo, Aixin Sun

Title:
MARS: Ena…

2 months, 3 weeks ago

Short Long

View Episode

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

Episode 1745

🤗 Upvotes: 22 | cs.AI, cs.CL, cs.LG

Authors:
Hanchen Li, Runyuan He, Qizheng Zhang, Changxiu Ji, Qiuyang Mang, X…

2 months, 3 weeks ago

Short Long

View Episode

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Episode 1744

🤗 Upvotes: 201 | cs.CV

Authors:
Chaoyou Fu, Haozhi Yuan, Yuhao Dong, Yi-Fan Zhang, Yunhang Shen, Xiaoxing Hu, Xu…

2 months, 3 weeks ago

Short Long

View Episode

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Episode 1743

🤗 Upvotes: 98 | cs.AI

Authors:
Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, C…

2 months, 3 weeks ago

Short Long

View Episode

Learning to Retrieve from Agent Trajectories

Episode 1742

🤗 Upvotes: 55 | cs.IR, cs.AI, cs.CL

Authors:
Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen

…

2 months, 3 weeks ago

Short Long

View Episode

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

Episode 1741

🤗 Upvotes: 47 | cs.LG

Authors:
Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li

…

2 months, 3 weeks ago

Short Long

View Episode

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

Episode 1740

🤗 Upvotes: 37 | cs.SE, cs.AI

Authors:
Shufan Jiang, Chios Chen, Zhiyang Chen

Title:
GBQA…

2 months, 3 weeks ago

Short Long

View Episode

Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

Episode 1739

🤗 Upvotes: 33 | cs.PF, cs.SE

Authors:
Qisheng Su, Shiting Huang, Zhen Fang, Ziyan Chen, Zehui Chen, Feng Zhao

…

2 months, 3 weeks ago

Short Long

View Episode

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Episode 1738

🤗 Upvotes: 32 | cs.AI

Authors:
Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson

T…

2 months, 3 weeks ago

Short Long

View Episode

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Episode 1737

🤗 Upvotes: 31 | cs.CV

Authors:
Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo

Title:
…

2 months, 3 weeks ago

Short Long

View Episode

Podcast Episodes

MARS: Enabling Autoregressive Models Multi-Token Generation

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Learning to Retrieve from Agent Trajectories

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Love PodBriefly?