Chapter 67: Meta-Learning (Learning to Learn)

Learning objectives Define a distribution of tasks (e.g. different goal positions in a gridworld) and sample tasks for meta-training. Implement a meta-training loop: for each task, collect data or run a few steps of adaptation, then update the meta-policy or meta-parameters to improve few-task performance. Explain the goal of meta-RL: learn an initialization or algorithm that adapts quickly to new tasks with few gradient steps or few episodes. Evaluate the meta-learned policy on held-out tasks with limited data and compare with training from scratch. Relate meta-RL to robot navigation (different goals or terrains) and game AI (different levels or opponents). Concept and real-world RL ...

March 10, 2026 · 4 min · 714 words · codefrydev

Chapter 68: Model-Agnostic Meta-Learning (MAML) in RL

Learning objectives Implement MAML for a simple RL task: sample tasks (e.g. different target velocities), compute inner update (one or a few gradient steps on task loss), then meta-update using the post-adaptation loss. Compute the meta-gradient (gradient of the post-adaptation return or loss w.r.t. initial parameters), using second-order derivatives or a first-order approximation. Explain why MAML learns an initialization that is “easy to fine-tune” with one or few gradient steps. Train a policy that adapts in one gradient step to a new task and evaluate on held-out tasks. Relate MAML to robot navigation (e.g. different terrains or payloads) and game AI (different levels). Concept and real-world RL ...

March 10, 2026 · 3 min · 636 words · codefrydev