Volume 7: Exploration and Meta-Learning

Chapters 61–70 — Hard exploration, intrinsic motivation, ICM, RND, count-based, Go-Explore, meta-learning, MAML, RL², UED.

Overall Progress 0%

DQN with ε-greedy on Montezuma's Revenge; sparse rewards.

Go to Chapter 61: The Hard Exploration Problem →

State visitation count bonus; exploration in gridworld.

Go to Chapter 62: Intrinsic Motivation →

ICM: forward model, prediction error as intrinsic reward; A2C on maze.

Go to Chapter 63: Curiosity-Driven Exploration (ICM) →

RND: fixed target, predictor; prediction error as intrinsic reward.

Go to Chapter 64: Random Network Distillation (RND) →

Count-based with hash table; pseudo-counts with density model for images.

Go to Chapter 65: Count-Based Exploration →

Simplified Go-Explore on deterministic maze; archive and return.

Go to Chapter 66: Go-Explore Algorithm →

Task distribution (e.g. goal positions); meta-training loop, few-step adapt.

Go to Chapter 67: Meta-Learning (Learning to Learn) →

MAML for locomotion (e.g. different velocities); one-step adapt.

Go to Chapter 68: Model-Agnostic Meta-Learning (MAML) in RL →

RNN policy with (state, action, reward, done) input; POMDP tasks.

Go to Chapter 69: RL² (Reinforcement Learning as an RNN) →

Simple PAIRED: adversary designs maze, agent solves; train both.

Go to Chapter 70: Unsupervised Environment Design →

Review Volume 7 (Exploration, ICM, RND, Go-Explore, Meta-RL) and preview Volume 8 (Offline RL, Imitation Learning, RLHF).

Go to Volume 7 Review & Bridge to Volume 8 →

The hard exploration problem, intrinsic motivation, curiosity (ICM), RND, count-based exploration, Go-Explore, meta-learning, MAML in RL, RL², and unsupervised environment design. Chapters 61–70.