Chapter 52: Learning World Models

Learning objectives Collect random trajectories from CartPole and train a neural network to predict the next state given (state, action). Evaluate prediction accuracy over 1 step, 5 steps, and 10 steps; observe compounding error as the horizon grows. Relate model error to the limitations of long-horizon model-based rollouts. Concept and real-world RL A world model (or dynamics model) predicts \(s_{t+1}\) from \(s_t, a_t\). We can train it on collected data (e.g. MSE loss). Errors compound over multi-step rollouts: a small 1-step error becomes large after many steps. In robot navigation, learned models are used for short-horizon planning; in game AI (e.g. Dreamer), models are used in latent space to reduce dimensionality and control rollouts. Understanding compounding error is key to designing model-based algorithms. ...

March 10, 2026 · 3 min · 442 words · codefrydev