Episode Details
Back to Episodes
【第63期】无论DPO还是PPO,Preference Feedback应该怎么用?
Published 1 year, 6 months ago
Description
Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。
今天的主题是:
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Summary
This NeurIPS 2024 paper investigates the effectiveness of different components in preference-based learning for language models. The authors systematically compare Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) algorithms, ...去小宇宙查看完整单集简介
前往小宇宙评论区与主播互动
今天的主题是:
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Summary
This NeurIPS 2024 paper investigates the effectiveness of different components in preference-based learning for language models. The authors systematically compare Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) algorithms, ...去小宇宙查看完整单集简介
前往小宇宙评论区与主播互动