Model-Based
Overall Progress
0%
Compare Dreamer and PPO sample efficiency on Walker.
MuZero: model in latent space; reward prediction.
Plot true vs predicted states; compounding error visualization.
Review Volume 5 (PPO, TRPO, SAC) and preview Volume 6 (Model-Based RL — learning world models and planning).
Review Volume 6 (Model-Based RL, MCTS, Dyna-Q, world models) and preview Volume 7 (Exploration — intrinsic motivation, curiosity, and sparse rewards).