Volume 6: Model-Based RL & Planning

Chapters 51–60 — Model-free vs model-based, world models, planning with known models, MCTS, AlphaZero, MuZero, Dreamer, MBPO, PETS.

Overall Progress 0%

Compare Dreamer and PPO sample efficiency on Walker.

Go to Chapter 51: Model-Free vs. Model-Based RL →

Train NN to predict next state from CartPole; compounding error.

Go to Chapter 52: Learning World Models →

BFS planner for gridworld; compare with DP.

Go to Chapter 53: Planning with Known Models →

MCTS for tic-tac-toe with UCT; play vs random.

Go to Chapter 54: Monte Carlo Tree Search (MCTS) →

Mini AlphaZero for tic-tac-toe: NN + MCTS, self-play.

Go to Chapter 55: AlphaZero Architecture →

MuZero: model in latent space; reward prediction.

Go to Chapter 56: MuZero Intuition →

Simplified Dreamer: RSSM, imagination phase, actor-critic.

Go to Chapter 57: Dreamer and Latent Imagination →

MBPO: ensemble dynamics, short rollouts, SAC buffer.

Go to Chapter 58: Model-Based Policy Optimization (MBPO) →

PETS: ensemble dynamics, MPC with random shooting.

Go to Chapter 59: Probabilistic Ensembles with Trajectory Sampling (PETS) →

Plot true vs predicted states; compounding error visualization.

Go to Chapter 60: Visualizing Model-Based Rollouts →

Review Volume 6 (Model-Based RL, MCTS, Dyna-Q, world models) and preview Volume 7 (Exploration — intrinsic motivation, curiosity, and sparse rewards).

Go to Volume 6 Review & Bridge to Volume 7 →

Model-free vs model-based RL, learning world models, planning with known models, Monte Carlo Tree Search, AlphaZero, MuZero, Dreamer, MBPO, PETS, and visualizing model rollouts. Chapters 51–60.