SAC
A practical guide to reading reinforcement learning research papers: structure, notation, and three annotated examples (DQN, PPO, SAC).
5 quick questions after Chapters 41–45 of Volume 5. Check you're ready to continue.
Max-entropy objective; why entropy encourages exploration.
SAC for HalfCheetah with automatic temperature tuning.
Compare SAC and PPO on Hopper, Walker2d; when to choose which.
Custom 2D point mass with continuous action; test with SAC.
Weights & Biases sweep for SAC on custom env.
MBPO: ensemble dynamics, short rollouts, SAC buffer.
Random policy dataset on Hopper; naive SAC overestimation.
Pretrain SAC offline; finetune online; Q-filter for bad actions.
Review Volume 5 (PPO, TRPO, SAC) and preview Volume 6 (Model-Based RL — learning world models and planning).