Chapter 28: Prioritized Experience Replay (PER)

Learning objectives Implement prioritized replay: assign each transition a priority (e.g. TD error \(|\delta|\)) and sample with probability proportional to \(p_i^\alpha\). Use a sum tree (or a simpler alternative) for efficient sampling and priority updates. Apply importance-sampling weights \(w_i = (N \cdot P(i))^{-\beta} / \max_j w_j\) to correct the bias introduced by non-uniform sampling. Concept and real-world RL Prioritized Experience Replay (PER) samples transitions with probability proportional to their “priority”—often the TD error—so that surprising or informative transitions are replayed more often. This can speed up learning but introduces bias (the update distribution is not the same as the uniform replay distribution). Importance-sampling weights correct for this by weighting the gradient update so that in expectation we recover the uniform case. A sum tree allows O(log N) sampling and priority update. PER is used in Rainbow and other sample-efficient DQN variants. ...

March 10, 2026 · 3 min · 633 words · codefrydev