Chapter 68: Model-Agnostic Meta-Learning (MAML) in RL
Learning objectives Implement MAML for a simple RL task: sample tasks (e.g. different target velocities), compute inner update (one or a few gradient steps on task loss), then meta-update using the post-adaptation loss. Compute the meta-gradient (gradient of the post-adaptation return or loss w.r.t. initial parameters), using second-order derivatives or a first-order approximation. Explain why MAML learns an initialization that is “easy to fine-tune” with one or few gradient steps. Train a policy that adapts in one gradient step to a new task and evaluate on held-out tasks. Relate MAML to robot navigation (e.g. different terrains or payloads) and game AI (different levels). Concept and real-world RL ...