Chapter 74: Introduction to Imitation Learning

Learning objectives Collect expert demonstrations (state-action pairs or trajectories) from a trained PPO agent on LunarLander. Train a behavioral cloning (BC) agent: supervised learning to predict the expert’s action given the state. Evaluate the BC policy in the environment and compare its return and behavior to the expert. Explain the assumptions of behavioral cloning (i.i.d. states from the expert distribution) and when it works well. Relate imitation learning to robot navigation (learning from human demos) and dialogue (learning from human responses). Concept and real-world RL ...

March 10, 2026 · 3 min · 626 words · codefrydev

Chapter 75: Limitations of Behavioral Cloning

Learning objectives Demonstrate the covariate shift problem: run the BC agent, record states it visits that were rare or absent in the expert data, and show that errors compound in those regions. Implement DAgger: collect new data by running the current BC policy (or a mix of expert and BC), query the expert for the correct action at those states, add to the dataset, and retrain BC. Explain why DAgger reduces covariate shift by adding on-policy (or mixed) states to the training set. Compare BC (trained only on expert data) with DAgger (iteratively aggregated) in terms of evaluation return and robustness. Relate covariate shift and DAgger to robot navigation and healthcare where the learner’s distribution can drift from the expert’s. Concept and real-world RL ...

March 10, 2026 · 4 min · 807 words · codefrydev