Model-free vs model-based RL, learning world models, planning with known models, Monte Carlo Tree Search, AlphaZero, MuZero, Dreamer, MBPO, PETS, and visualizing model rollouts. Chapters 51–60.
Volume 6: Model-Based RL & Planning
Chapters 51–60 — Model-free vs model-based, world models, planning with known models, MCTS, AlphaZero, MuZero, Dreamer, MBPO, PETS.
Compare Dreamer and PPO sample efficiency on Walker.
Train NN to predict next state from CartPole; compounding error.
BFS planner for gridworld; compare with DP.
MCTS for tic-tac-toe with UCT; play vs random.
Mini AlphaZero for tic-tac-toe: NN + MCTS, self-play.
MuZero: model in latent space; reward prediction.
Simplified Dreamer: RSSM, imagination phase, actor-critic.
MBPO: ensemble dynamics, short rollouts, SAC buffer.
PETS: ensemble dynamics, MPC with random shooting.
Plot true vs predicted states; compounding error visualization.
Review Volume 6 (Model-Based RL, MCTS, Dyna-Q, world models) and preview Volume 7 (Exploration — intrinsic motivation, curiosity, and sparse rewards).