Volume 8: Offline RL & Imitation Learning

Chapters 71–80 — Offline RL problem, CQL, Decision Transformers, behavioral cloning, DAgger, IRL, GAIL, AMP, offline-to-online, RLHF basics.

Overall Progress 0%

Random policy dataset on Hopper; naive SAC overestimation.

Go to Chapter 71: The Offline RL Problem →

CQL loss penalizing Q for OOD actions; compare with naive SAC.

Go to Chapter 72: Conservative Q-Learning (CQL) →

Decision Transformer: returns-to-go, states, actions; GPT-like predict actions.

Go to Chapter 73: Decision Transformers →

Expert demos from PPO on LunarLander; behavioral cloning.

Go to Chapter 74: Introduction to Imitation Learning →

Covariate shift; DAgger: mix expert and BC, retrain.

Go to Chapter 75: Limitations of Behavioral Cloning →

Max-ent IRL: learn reward from expert; linear reward, forward RL.

Go to Chapter 76: Inverse Reinforcement Learning (IRL) →

Discriminator expert vs agent; use as reward for policy gradient.

Go to Chapter 77: Generative Adversarial Imitation Learning (GAIL) →

AMP paper: task reward + adversarial style reward; combined reward.

Go to Chapter 78: Adversarial Motion Priors (AMP) →

Pretrain SAC offline; finetune online; Q-filter for bad actions.

Go to Chapter 79: Offline-to-Online Finetuning →

Bradley-Terry from pairwise comparisons; train policy with PPO.

Go to Chapter 80: RL from Human Feedback (RLHF) Basics →

Review Volume 8 (Offline RL, Imitation Learning, IRL, RLHF) and preview Volume 9 (Multi-Agent RL — cooperation, competition, game theory).

Go to Volume 8 Review & Bridge to Volume 9 →

The offline RL problem, Conservative Q-Learning (CQL), Decision Transformers, imitation learning, limitations of behavioral cloning, DAgger, inverse RL, GAIL, AMP, offline-to-online finetuning, and RLHF basics. Chapters 71–80.