Chapter 63: Curiosity-Driven Exploration (ICM)

Learning objectives Implement the Intrinsic Curiosity Module: a forward model that predicts next-state features from current state and action. Use prediction error (between predicted and actual next features) as intrinsic reward and combine it with A2C. Explain why prediction error encourages exploration in novel or stochastic parts of the state space. Compare exploration behavior (e.g. coverage, time to goal) with and without ICM on a sparse-reward maze. Relate curiosity-driven exploration to robot navigation and game AI where rewards are sparse. Concept and real-world RL ...

March 10, 2026 · 3 min · 624 words · codefrydev

Chapter 64: Random Network Distillation (RND)

Learning objectives Implement RND: a fixed random target network and a predictor network that fits the target on visited states. Use prediction error (target output vs predictor output) as intrinsic reward for exploration. Explain why RND rewards novelty without learning a forward model of the environment. Apply RND to a hard exploration problem (e.g. Pitfall-style or sparse-reward maze) and compare with ε-greedy or count-based exploration. Relate RND to game AI and robot navigation where state spaces are large and rewards sparse. Concept and real-world RL ...

March 10, 2026 · 3 min · 628 words · codefrydev