Chapter 74: Introduction to Imitation Learning
Learning objectives Collect expert demonstrations (state-action pairs or trajectories) from a trained PPO agent on LunarLander. Train a behavioral cloning (BC) agent: supervised learning to predict the expert’s action given the state. Evaluate the BC policy in the environment and compare its return and behavior to the expert. Explain the assumptions of behavioral cloning (i.i.d. states from the expert distribution) and when it works well. Relate imitation learning to robot navigation (learning from human demos) and dialogue (learning from human responses). Concept and real-world RL ...