Chapter 77: Generative Adversarial Imitation Learning (GAIL)

Learning objectives Implement GAIL: train a discriminator D(s, a) to distinguish state-action pairs from the expert vs from the current policy; use the discriminator output (or log D) as reward for a policy gradient method. Train the policy to maximize the discriminator reward (i.e. to fool the discriminator) while the discriminator tries to tell expert from agent. Test on a simple task (e.g. CartPole or MuJoCo) and compare imitation quality with behavioral cloning. Explain the connection to GANs: the policy is the generator, the discriminator gives the learning signal. Relate GAIL to robot navigation and game AI where we have expert demos and want to match the expert distribution without hand-designed rewards. Concept and real-world RL ...

March 10, 2026 · 4 min · 704 words · codefrydev

Chapter 78: Adversarial Motion Priors (AMP)

Learning objectives Read the AMP paper and explain how it combines a task reward (e.g. velocity tracking, goal reaching) with an adversarial style reward (discriminator that scores motion similarity to reference data). Write the combined reward function: r = r_task + λ r_style, where r_style comes from a discriminator trained to distinguish agent motion from reference (e.g. motion capture) data. Identify why adding a style reward helps produce natural-looking and robust locomotion compared to task-only reward. Relate AMP to robot navigation and game AI (character animation) where we want both task success and natural motion. Concept and real-world RL ...

March 10, 2026 · 4 min · 717 words · codefrydev