Chapter 57: Dreamer and Latent Imagination
Learning objectives Implement a simplified Dreamer-style algorithm: train an RSSM-like model on collected trajectories, then roll out in latent space to train an actor-critic. Understand the imagination phase: no real env steps; only latent rollouts for policy updates. Relate to robot control and sample-efficient RL. Concept and real-world RL Dreamer learns a recurrent state-space model (RSSM) in latent space: encode observation to latent, predict next latent given action, predict reward and continue. The actor-critic is trained on imagined rollouts (latent only), so many gradient steps use no real env interaction. In robot navigation and game AI, this yields high sample efficiency. The key is training the model and the policy on the same data so the latent space is useful for control. ...