Step Size

Overall Progress 0%

When reward distributions change over time—exponential recency-weighted average and constant step size.

Large step size and policy collapse in bandit; visualize probabilities.