Chapter 64: Random Network Distillation (RND)

Learning objectives Implement RND: a fixed random target network and a predictor network that fits the target on visited states. Use prediction error (target output vs predictor output) as intrinsic reward for exploration. Explain why RND rewards novelty without learning a forward model of the environment. Apply RND to a hard exploration problem (e.g. Pitfall-style or sparse-reward maze) and compare with ε-greedy or count-based exploration. Relate RND to game AI and robot navigation where state spaces are large and rewards sparse. Concept and real-world RL ...

March 10, 2026 · 3 min · 628 words · codefrydev