Podcast Episodes
Back to SearchThe Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Episode 836
🤗 Upvotes: 84 | cs.LG, cs.AI, cs.CL
Authors:
Ganqu Cui, Yuchen Zhang, Jiacheng Chen, Lifan Yuan, Zhi Wang, Yuxin…
11Â months, 1Â week ago
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Episode 835
🤗 Upvotes: 63 | cs.SE, cs.CL
Authors:
Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich, Anton Shevtso…
11Â months, 1Â week ago
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Episode 834
🤗 Upvotes: 59 | cs.CL, cs.AI, cs.LG, cs.PF, I.2.7
Authors:
Tianyu Fu, Yi Ge, Yichen You, Enshu Liu, Zhihang Yuan…
11Â months, 1Â week ago