Who is this for? This assessment checks whether you have the math, Python, NumPy, PyTorch, and basic RL concepts needed to start the curriculum comfortably.
New to programming? Start with the Learning path Phase 0. Unsure about math? Try the Math for RL track first. After that, use this assessment to see if you are ready for the curriculum.
25 questions to assess your foundational knowledge before the 100-chapter reinforcement learning curriculum. Answer honestly; each topic page has solutions and explanations to help you understand the “why,” not just the “what.” If you can answer at least 20 correctly and feel comfortable with the concepts, you are ready to start. If you struggled with many, review Prerequisites or the Learning path and come back.
Recommended order#
Work through the topics in this order for a logical flow: math foundations → programming (Python, NumPy, PyTorch) → RL framework and value functions → tabular methods → function approximation and deep RL → self-assessment.
- Probability & statistics (Q1–Q2)
- Linear algebra (Q3–Q4)
- Calculus (Q5–Q6)
- Python basics (Q7)
- NumPy (Q8)
- PyTorch basics (Q9)
- RL framework (Q10–Q13)
- Value functions & Bellman (Q14–Q15)
- Tabular methods (Q16–Q19)
- Function approximation & Deep RL (Q20–Q24)
- Final self-assessment (Q25)
Syllabus at a glance#
| Topic | What you’ll do | Questions |
|---|
| Probability & statistics | Sample mean, variance, expectation vs sample average, law of large numbers; bandit-style problems and code | Q1, Q2 |
| Linear algebra | Dot product, matrix-vector product, \(\nabla_w (Aw)\); NumPy snippet | Q3, Q4 |
| Calculus | Derivatives, chain rule, sigmoid; small code check | Q5, Q6 |
| Python basics | Moving average, list comprehensions, dict of returns | Q7 |
| NumPy | Create array, set row, element-wise product; slices and shapes | Q8 |
| PyTorch basics | Tensors, requires_grad, backward(), autograd examples | Q9 |
| RL framework | Agent, environment, state, action, reward; Markov; exploration-exploitation; \(\gamma\) | Q10–Q13 |
| Value functions & Bellman | \(V^\pi(s)\), \(Q^\pi(s,a)\); Bellman expectation equation; tiny MDP | Q14, Q15 |
| Tabular methods | Policy iteration, value iteration; MC vs TD; on-policy vs off-policy; Q-learning update | Q16–Q19 |
| Function approximation & Deep RL | Why FA; policy gradient update; ε-greedy, noisy nets; experience replay; actor-critic | Q20–Q24 |
| Final self-assessment | Rate comfort in Python, math, and RL; links to review | Q25 |
Each topic page includes worked problems with explanations, code examples with explanations, and math examples with step-by-step reasoning. Use them to fill gaps before starting the Curriculum.
This page covers the calculus you need for the preliminary assessment: derivatives of common functions, the chain rule, and how they appear in logistic regression and policy gradients. Back to Preliminary.
Why this matters for RL Policy gradients and loss-based updates use derivatives and the chain rule. You don’t need to derive everything by hand in practice (autograd does it), but you need to understand what a gradient is and how it’s used. The sigmoid and chain rule appear in logistic policies and in backpropagation.
...
This is the final self-assessment step of the preliminary material. Use it to reflect on your readiness and to find gaps before starting the 100-chapter curriculum. Back to Preliminary.
Why this step matters The curriculum assumes comfort with probability, linear algebra, calculus, Python, NumPy, PyTorch, and basic RL ideas. If you are weak in one area, you can still start, but you’ll progress more smoothly if you strengthen those areas first. This page helps you identify where to spend a bit more time.
...
This page covers function approximation and deep RL concepts you need for the preliminary assessment: why we need FA, the policy gradient update, exploration in DQN, experience replay, and the advantage of actor-critic. Back to Preliminary.
Why this matters for RL In large or continuous state spaces we cannot store a value per state; we use a parameterized function (e.g. neural network) to approximate values or policies. That leads to policy gradient methods (maximize return) and value-based methods with FA (e.g. DQN). DQN uses experience replay and exploration (e.g. ε-greedy); actor-critic combines a policy (actor) and a value function (critic) for lower-variance policy gradients. You need to understand why FA is necessary and how these pieces fit together.
...
This page covers the linear algebra you need for the preliminary assessment: dot product, matrix-vector multiplication, and gradients with respect to vectors. Back to Preliminary.
Why this matters for RL States and observations are often vectors; linear value approximation uses \(V(s) \approx w^T x(s)\); neural networks are built from matrix-vector products and gradients. You need to compute dot products and \(\nabla_w (Aw)\) by hand and understand their geometric meaning.
Learning objectives Compute dot products and matrix-vector products; state \(\nabla_w (Aw) = A^T\) (for column gradient); relate these to state vectors and value approximation.
...
This page covers the NumPy you need for the preliminary assessment: creating arrays, indexing, slicing, and element-wise operations. Back to Preliminary.
Why this matters for RL Environments return observations as arrays; neural networks consume batches of arrays. NumPy is the standard way to represent states, reward vectors, and batches of transitions. You need to create and reshape arrays, slice them, and know the difference between element-wise and matrix multiplication.
Learning objectives Create and index NumPy arrays; set rows/columns; compute element-wise products and matrix-vector products; use np.dot or @ correctly.
...
This page covers the probability and statistics you need for the preliminary assessment: sample mean, unbiased sample variance, expectation vs sample average, and the law of large numbers. Back to Preliminary.
Why this matters for RL In reinforcement learning, rewards are often random and value functions are expected returns. Bandits, Monte Carlo methods, and policy evaluation all rely on expectations and sample averages. You need to compute and interpret sample means and variances by hand and in code.
...
This page covers the Python you need for the preliminary assessment: writing functions, working with lists, and using list comprehensions. Back to Preliminary.
Why this matters for RL RL code is full of trajectories (lists of states, actions, rewards), configs (dicts), and custom types (agents, buffers). You need to write clear functions, slice sequences, and aggregate data. Moving averages and rolling computations appear when processing reward sequences or returns.
Learning objectives Write a function that returns the moving average of a list; use list comprehensions and loops; structure code for clarity and reuse.
...
This page covers the PyTorch you need for the preliminary assessment: creating tensors, enabling gradients, and computing gradients with backward(). Back to Preliminary.
Why this matters for RL States, actions, and batches are tensors; policy and value networks use nn.Module and autograd. Policy gradient and value losses require gradients; backward() and .grad are central. You need to create tensors with requires_grad=True and interpret the gradients PyTorch computes.
Learning objectives Create a tensor with requires_grad=True; compute a scalar function of it and call backward(); read the gradient from .grad. Relate this to loss minimization and policy gradient.
...
This page covers the core RL framework you need for the preliminary assessment: the four main components, the Markov property, exploration vs exploitation, and the discount factor. Back to Preliminary.
Why this matters for RL Every RL problem is defined by who acts (agent), what they interact with (environment), what they observe (state), what they can do (actions), and what feedback they get (reward). The Markov property and the discount factor shape how we define value functions and algorithms. Exploration vs exploitation is the central tension in learning from experience.
...
This page covers the tabular methods you need for the preliminary assessment: policy iteration and value iteration, the difference between Monte Carlo and TD, on-policy vs off-policy learning, and the Q-learning update rule. Back to Preliminary.
Why this matters for RL When the state and action spaces are small enough, we can store one value per state (or state-action) and update them from experience or from the model. Dynamic programming does this when we know the model; Monte Carlo and TD do it from samples. Q-learning is the canonical off-policy TD method and is the basis of many deep RL algorithms (e.g. DQN). You need to know how these methods differ and how to write the Q-learning update.
...