SAC | Reinforcement Learning Curriculum

A practical guide to reading reinforcement learning research papers: structure, notation, and three annotated examples (DQN, PPO, SAC).

5 quick questions after Chapters 41–45 of Volume 5. Check you're ready to continue.

Max-entropy objective; why entropy encourages exploration.

SAC for HalfCheetah with automatic temperature tuning.

Compare SAC and PPO on Hopper, Walker2d; when to choose which.

Custom 2D point mass with continuous action; test with SAC.

Weights & Biases sweep for SAC on custom env.

MBPO: ensemble dynamics, short rollouts, SAC buffer.

Random policy dataset on Hopper; naive SAC overestimation.

Pretrain SAC offline; finetune online; Q-filter for bad actions.

Review Volume 5 (PPO, TRPO, SAC) and preview Volume 6 (Model-Based RL — learning world models and planning).