Skip to main content
Home
Learn
Learning path
Math for RL
Preliminary
Prerequisites
ML Foundations
DL Foundations
Curriculum
🧪 Lab (Python)
Glossary
Assessments
Appendix
Course outline
search
tags
Archives
Direct Preference Optimization
Overall Progress
0%
Step 1 — Vol 10 · Ch 7
Completed
Chapter 97: Direct Preference Optimization (DPO)
DPO loss from Bradley-Terry and KL-optimal policy; compare with PPO.