Probability & Statistics

This page covers the probability and statistics you need for RL: expectations, variance, sample means, and the idea that sample averages converge to expectations. Back to Math for RL. Core concepts Random variables and expectation A random variable \(X\) takes values according to some distribution. The expected value (or expectation) \(\mathbb{E}[X]\) is the long-run average if you repeat the experiment infinitely many times. For a discrete \(X\) with outcomes \(x_i\) and probabilities \(p_i\): \(\mathbb{E}[X] = \sum_i x_i p_i\). For a continuous distribution with density \(p(x)\): \(\mathbb{E}[X] = \int x,p(x),dx\) (you will mostly see discrete or simple continuous cases in RL). In reinforcement learning: The return (sum of discounted rewards) is a random variable because rewards and transitions can be random. The value function \(V(s)\) is the expected return from state \(s\). Multi-armed bandits: each arm has an expected reward; we estimate it from samples. ...

March 10, 2026 · 8 min · 1699 words · codefrydev