Chapter 51: Model-Free vs. Model-Based RL
Learning objectives Compare model-free (e.g. PPO) and model-based (e.g. Dreamer) RL in terms of sample efficiency on a continuous control task like Walker. Explain why model-based methods can achieve more reward per real environment step (use of imagined rollouts). Identify trade-offs: model bias, computation, and implementation complexity. Concept and real-world RL Model-free methods learn a policy or value function directly from experience; model-based methods learn a dynamics model and use it for planning or imagined rollouts. Model-based RL can be more sample-efficient because each real transition can be reused many times in the model (short rollouts, planning). In robot navigation and trading, where real data is expensive, sample efficiency matters; in game AI, model-based methods (e.g. MuZero) combine learning and planning. The downside is model error (compounding over long rollouts) and extra computation. ...