Chapter 61: The Hard Exploration Problem

Learning objectives Run DQN with ε-greedy on a sparse-reward environment (e.g. Montezuma’s Revenge if available, or a simple maze). Observe that the agent rarely discovers the first key (or goal) when rewards are sparse. Explain why sparse rewards cause failure: no learning signal until the goal is reached; random exploration is unlikely to reach it. Concept and real-world RL Hard exploration occurs when the reward is sparse (e.g. only at the goal): the agent gets no signal until it accidentally reaches the goal, which may require a long, specific sequence of actions. In game AI (Montezuma’s Revenge, Pitfall), ε-greedy DQN fails because random exploration almost never finds the key. In robot navigation and recommendation, sparse rewards (e.g. “user clicked” or “reached goal”) similarly make learning slow. This motivates intrinsic motivation, curiosity, and hierarchical methods. ...

March 10, 2026 · 3 min · 489 words · codefrydev