Bandits
Overall Progress
0%
Using optimistic initial Q-values to encourage early exploration in multi-armed bandits.
Sample mean, variance, expectation, and law of large numbers — with bandit-style problems and explained solutions.
Upper Confidence Bound (UCB1) algorithm for multi-armed bandits—balance exploration and exploitation using uncertainty.
Bayesian bandits and Thompson Sampling—sample from the posterior to balance exploration and exploitation.
When reward distributions change over time—exponential recency-weighted average and constant step size.
When to implement bandits from scratch vs. use existing libraries—learning goals and control.