Phase 1 Self-Check: Math for RL

Use this self-check after completing Probability, Linear algebra, and Calculus. If you can answer at least 8 correctly and feel comfortable with the concepts, you are ready for Phase 2 and the curriculum. 1. Probability Q: In a bandit, you pull arm 2 five times and get rewards [0.5, 1.2, 0.8, 1.0, 0.9]. What is the sample mean? What is the unbiased sample variance (use \(n-1\) in the denominator)? Answer Step 1 — Sample mean: Sum = 0.5 + 1.2 + 0.8 + 1.0 + 0.9 = 4.4; mean = 4.4/5 = 0.88. ...

March 10, 2026 · 5 min · 858 words · codefrydev

Phase 2 Readiness Quiz

Use this quiz after working through Python, NumPy, and PyTorch (and optionally Gym). If you can answer at least 6 correctly, you are ready for Phase 3 and Volume 1. 1. Python Q: What is the output of [x**2 for x in range(4)]? Answer Step 1: range(4) gives 0, 1, 2, 3. Step 2: x**2 for each gives 0, 1, 4, 9. Answer: [0, 1, 4, 9]. List comprehensions are used throughout the curriculum for building lists from trajectories (e.g. rewards, returns). ...

March 10, 2026 · 4 min · 656 words · codefrydev

Phase 3 Foundations Quiz

Use this quiz after completing Volume 1 and Volume 2 (or the Phase 3 mini-project). If you can answer at least 12 of 15 correctly, you are ready for Phase 4 and Volume 3. 1. RL framework Q: Name the four main components of an RL system (agent, environment, and two more). What is a state? Answer Agent, environment, action, reward. State: a representation of the current situation the agent uses to choose actions. 2. Return Q: For rewards [0, 0, 1] and \(\gamma = 0.9\), compute the discounted return \(G_0\) from step 0. ...

March 10, 2026 · 5 min · 876 words · codefrydev

Phase 4 Deep RL Quiz

Use this quiz after completing Volumes 3–5 (or the Phase 4 coding challenges). If you can answer at least 9 of 12 correctly, you are ready for Phase 5 and Volume 6. 1. Function approximation Q: Why is function approximation necessary in RL for large or continuous state spaces? Answer Tabular methods store one value per state (or state-action); the number of states can be huge or infinite. Function approximation uses a parameterized function (e.g. neural network) so a fixed number of parameters represent values for all states and generalize from seen to unseen states. ...

March 10, 2026 · 4 min · 814 words · codefrydev

Worked Solutions Index

This page points you to all places where worked solutions (step-by-step answers, derivations, and code) are available. Use it to check your work or to study from full solutions. Math for RL Each topic page has practice questions with full solutions in collapsible “Answer and explanation” sections: Probability & statistics — Sample mean, variance, expectation, law of large numbers, bandit-style problems. Linear algebra — Dot product, matrix-vector product, gradients, NumPy. Calculus — Derivatives, chain rule, partial derivatives, policy gradient. Every practice question includes a step-by-step derivation and a short “In RL” explanation. ...

March 10, 2026 · 2 min · 285 words · codefrydev