Chapter 59: Probabilistic Ensembles with Trajectory Sampling (PETS)
Learning objectives Implement PETS: an ensemble of probabilistic dynamics models (e.g. output mean and variance), and trajectory sampling (e.g. random shooting or CEM) to select actions via model predictive control (MPC). Use the model to evaluate action sequences and pick the best (no policy network). Apply to a continuous control task and compare with a policy-based method. Concept and real-world RL PETS uses an ensemble of probabilistic models to capture uncertainty; then at each step it samples many action sequences, rolls them out in the model, and chooses the sequence with the best predicted return (MPC). No policy network is trained; action selection is planning at test time. In robot control, MPC with learned models is used when we can afford computation at deployment; in trading, short-horizon planning with a learned model can improve decisions. ...