Calculus

This page covers the calculus you need for RL: derivatives, the chain rule, and partial derivatives. Back to Math for RL. Core concepts Derivatives The derivative of \(f(x)\) with respect to \(x\) is \(f’(x)\) or \(\frac{df}{dx}\). It gives the rate of change (slope) of \(f\) at \(x\). Rules you will use: \(\frac{d}{dx} x^n = n x^{n-1}\) \(\frac{d}{dx} e^x = e^x\) \(\frac{d}{dx} \ln x = \frac{1}{x}\) \(\frac{d}{dx} \ln(1 + e^x) = \frac{e^x}{1+e^x} = \sigma(x)\) (sigmoid) The chart below shows the sigmoid \(\sigma(x) = \frac{e^x}{1+e^x}\): the S-shaped function whose derivative we use in policy parameterizations and softplus. ...

March 10, 2026 · 8 min · 1554 words · codefrydev

Linear Algebra

This page covers the linear algebra you need for RL: vectors, dot products, matrices, matrix-vector multiplication, and the idea of gradients. Back to Math for RL. Core concepts Vectors A vector is an ordered list of numbers, e.g. \(x = [x_1, x_2, x_3]^T\) (column vector). We treat it as a column by default. The dot product of two vectors \(x\) and \(y\) of the same length is \(x^T y = \sum_i x_i y_i\). Geometrically, it is related to the angle between the vectors and their lengths: \(x^T y = |x| |y| \cos\theta\). ...

March 10, 2026 · 9 min · 1736 words · codefrydev

Probability & Statistics

This page covers the probability and statistics you need for RL: expectations, variance, sample means, and the idea that sample averages converge to expectations. Back to Math for RL. Core concepts Random variables and expectation A random variable \(X\) takes values according to some distribution. The expected value (or expectation) \(\mathbb{E}[X]\) is the long-run average if you repeat the experiment infinitely many times. For a discrete \(X\) with outcomes \(x_i\) and probabilities \(p_i\): \(\mathbb{E}[X] = \sum_i x_i p_i\). For a continuous distribution with density \(p(x)\): \(\mathbb{E}[X] = \int x,p(x),dx\) (you will mostly see discrete or simple continuous cases in RL). In reinforcement learning: The return (sum of discounted rewards) is a random variable because rewards and transitions can be random. The value function \(V(s)\) is the expected return from state \(s\). Multi-armed bandits: each arm has an expected reward; we estimate it from samples. ...

March 10, 2026 · 8 min · 1699 words · codefrydev