Phase 1 Self-Check: Math for RL

Use this self-check after completing Probability, Linear algebra, and Calculus. If you can answer at least 8 correctly and feel comfortable with the concepts, you are ready for Phase 2 and the curriculum. 1. Probability Q: In a bandit, you pull arm 2 five times and get rewards [0.5, 1.2, 0.8, 1.0, 0.9]. What is the sample mean? What is the unbiased sample variance (use \(n-1\) in the denominator)? Answer Step 1 — Sample mean: Sum = 0.5 + 1.2 + 0.8 + 1.0 + 0.9 = 4.4; mean = 4.4/5 = 0.88. ...

March 10, 2026 · 5 min · 858 words · codefrydev

Phase 2 Readiness Quiz

Use this quiz after working through Python, NumPy, and PyTorch (and optionally Gym). If you can answer at least 6 correctly, you are ready for Phase 3 and Volume 1. 1. Python Q: What is the output of [x**2 for x in range(4)]? Answer Step 1: range(4) gives 0, 1, 2, 3. Step 2: x**2 for each gives 0, 1, 4, 9. Answer: [0, 1, 4, 9]. List comprehensions are used throughout the curriculum for building lists from trajectories (e.g. rewards, returns). ...

March 10, 2026 · 4 min · 656 words · codefrydev

Phase 3 Foundations Quiz

Use this quiz after completing Volume 1 and Volume 2 (or the Phase 3 mini-project). If you can answer at least 12 of 15 correctly, you are ready for Phase 4 and Volume 3. 1. RL framework Q: Name the four main components of an RL system (agent, environment, and two more). What is a state? Answer Agent, environment, action, reward. State: a representation of the current situation the agent uses to choose actions. 2. Return Q: For rewards [0, 0, 1] and \(\gamma = 0.9\), compute the discounted return \(G_0\) from step 0. ...

March 10, 2026 · 5 min · 876 words · codefrydev

Phase 4 Deep RL Quiz

Use this quiz after completing Volumes 3–5 (or the Phase 4 coding challenges). If you can answer at least 9 of 12 correctly, you are ready for Phase 5 and Volume 6. 1. Function approximation Q: Why is function approximation necessary in RL for large or continuous state spaces? Answer Tabular methods store one value per state (or state-action); the number of states can be huge or infinite. Function approximation uses a parameterized function (e.g. neural network) so a fixed number of parameters represent values for all states and generalize from seen to unseen states. ...

March 10, 2026 · 4 min · 814 words · codefrydev