The hard exploration problem, intrinsic motivation, curiosity (ICM), RND, count-based exploration, Go-Explore, meta-learning, MAML in RL, RL², and unsupervised environment design. Chapters 61–70.
Volume 7: Exploration and Meta-Learning
Chapters 61–70 — Hard exploration, intrinsic motivation, ICM, RND, count-based, Go-Explore, meta-learning, MAML, RL², UED.
DQN with ε-greedy on Montezuma's Revenge; sparse rewards.
State visitation count bonus; exploration in gridworld.
ICM: forward model, prediction error as intrinsic reward; A2C on maze.
RND: fixed target, predictor; prediction error as intrinsic reward.
Count-based with hash table; pseudo-counts with density model for images.
Simplified Go-Explore on deterministic maze; archive and return.
Task distribution (e.g. goal positions); meta-training loop, few-step adapt.
MAML for locomotion (e.g. different velocities); one-step adapt.
RNN policy with (state, action, reward, done) input; POMDP tasks.
Simple PAIRED: adversary designs maze, agent solves; train both.
Review Volume 7 (Exploration, ICM, RND, Go-Explore, Meta-RL) and preview Volume 8 (Offline RL, Imitation Learning, RLHF).