Chapter 51: Model-Free vs. Model-Based RL

Learning objectives Compare model-free (e.g. PPO) and model-based (e.g. Dreamer) RL in terms of sample efficiency on a continuous control task like Walker. Explain why model-based methods can achieve more reward per real environment step (use of imagined rollouts). Identify trade-offs: model bias, computation, and implementation complexity. Concept and real-world RL Model-free methods learn a policy or value function directly from experience; model-based methods learn a dynamics model and use it for planning or imagined rollouts. Model-based RL can be more sample-efficient because each real transition can be reused many times in the model (short rollouts, planning). In robot navigation and trading, where real data is expensive, sample efficiency matters; in game AI, model-based methods (e.g. MuZero) combine learning and planning. The downside is model error (compounding over long rollouts) and extra computation. ...

March 10, 2026 · 3 min · 446 words · codefrydev

Chapter 57: Dreamer and Latent Imagination

Learning objectives Implement a simplified Dreamer-style algorithm: train an RSSM-like model on collected trajectories, then roll out in latent space to train an actor-critic. Understand the imagination phase: no real env steps; only latent rollouts for policy updates. Relate to robot control and sample-efficient RL. Concept and real-world RL Dreamer learns a recurrent state-space model (RSSM) in latent space: encode observation to latent, predict next latent given action, predict reward and continue. The actor-critic is trained on imagined rollouts (latent only), so many gradient steps use no real env interaction. In robot navigation and game AI, this yields high sample efficiency. The key is training the model and the policy on the same data so the latent space is useful for control. ...

March 10, 2026 · 3 min · 464 words · codefrydev