Skip to main content
Home
Learn
Learning path
Math for RL
Preliminary
Prerequisites
ML Foundations
DL Foundations
Curriculum
🧪 Lab (Python)
Glossary
Assessments
Appendix
Course outline
search
tags
Archives
KL Penalty
Overall Progress
0%
Step 1 — Vol 10 · Ch 5
Completed
Chapter 95: Training Large Language Models with PPO
PPO fine-tune small LM (e.g. GPT-2) for sentiment; KL penalty.