Chapter 52: Learning World Models

Learning objectives Collect random trajectories from CartPole and train a neural network to predict the next state given (state, action). Evaluate prediction accuracy over 1 step, 5 steps, and 10 steps; observe compounding error as the horizon grows. Relate model error to the limitations of long-horizon model-based rollouts. Concept and real-world RL A world model (or dynamics model) predicts \(s_{t+1}\) from \(s_t, a_t\). We can train it on collected data (e.g. MSE loss). Errors compound over multi-step rollouts: a small 1-step error becomes large after many steps. In robot navigation, learned models are used for short-horizon planning; in game AI (e.g. Dreamer), models are used in latent space to reduce dimensionality and control rollouts. Understanding compounding error is key to designing model-based algorithms. ...

March 10, 2026 · 3 min · 442 words · codefrydev

Chapter 60: Visualizing Model-Based Rollouts

Learning objectives For a learned dynamics model (e.g. from Chapter 52), sample a starting state and generate a rollout of predicted states for a fixed action sequence. Plot the true states (from the environment) and the predicted states (from the model) on the same axes to visualize compounding error. Interpret the plot: where does the model diverge from reality? Concept and real-world RL Visualizing model rollouts vs real rollouts makes compounding error concrete: small 1-step errors accumulate and the predicted trajectory drifts. In robot navigation and model-based RL, this motivates short rollouts, ensemble methods, and uncertainty-aware planning. The same idea applies to trading models (predictions diverge over time) and dialogue (conversation dynamics). ...

March 10, 2026 · 3 min · 466 words · codefrydev