Preliminary Assessment

Who is this for? This assessment checks whether you have the math, Python, NumPy, PyTorch, and basic RL concepts needed to start the curriculum comfortably.

New to programming? Start with the Learning path Phase 0. Unsure about math? Try the Math for RL track first. After that, use this assessment to see if you are ready for the curriculum.

25 questions to assess your foundational knowledge before the 100-chapter reinforcement learning curriculum. Answer honestly; each topic page has solutions and explanations to help you understand the “why,” not just the “what.” If you can answer at least 20 correctly and feel comfortable with the concepts, you are ready to start. If you struggled with many, review Prerequisites or the Learning path and come back.

Recommended order

Work through the topics in this order for a logical flow: math foundations → programming (Python, NumPy, PyTorch) → RL framework and value functions → tabular methods → function approximation and deep RL → self-assessment.

Probability & statistics (Q1–Q2)
Linear algebra (Q3–Q4)
Calculus (Q5–Q6)
Python basics (Q7)
NumPy (Q8)
PyTorch basics (Q9)
RL framework (Q10–Q13)
Value functions & Bellman (Q14–Q15)
Tabular methods (Q16–Q19)
Function approximation & Deep RL (Q20–Q24)
Final self-assessment (Q25)

Syllabus at a glance

Topic	What you’ll do	Questions
Probability & statistics	Sample mean, variance, expectation vs sample average, law of large numbers; bandit-style problems and code	Q1, Q2
Linear algebra	Dot product, matrix-vector product, \(\nabla_w (Aw)\); NumPy snippet	Q3, Q4
Calculus	Derivatives, chain rule, sigmoid; small code check	Q5, Q6
Python basics	Moving average, list comprehensions, dict of returns	Q7
NumPy	Create array, set row, element-wise product; slices and shapes	Q8
PyTorch basics	Tensors, `requires_grad`, `backward()`, autograd examples	Q9
RL framework	Agent, environment, state, action, reward; Markov; exploration-exploitation; \(\gamma\)	Q10–Q13
Value functions & Bellman	\(V^\pi(s)\), \(Q^\pi(s,a)\); Bellman expectation equation; tiny MDP	Q14, Q15
Tabular methods	Policy iteration, value iteration; MC vs TD; on-policy vs off-policy; Q-learning update	Q16–Q19
Function approximation & Deep RL	Why FA; policy gradient update; ε-greedy, noisy nets; experience replay; actor-critic	Q20–Q24
Final self-assessment	Rate comfort in Python, math, and RL; links to review	Q25

Each topic page includes worked problems with explanations, code examples with explanations, and math examples with step-by-step reasoning. Use them to fill gaps before starting the Curriculum.

Value Functions and Bellman Equation

This page covers value functions and the Bellman equation you need for the preliminary assessment: state-value \(V^\pi(s)\), action-value \(Q^\pi(s,a)\), and the Bellman expectation equation for \(V^\pi\). Back to Preliminary. Why this matters for RL Value functions are the expected return from a state (or state-action pair) under a policy. They are the main object we estimate in value-based methods (e.g. TD, Q-learning) and appear in actor-critic as the critic. The Bellman equation is the recursive identity that connects the value at one state to immediate reward and values at successor states; it is the basis of dynamic programming and TD learning. ...

Recommended order#

Syllabus at a glance#

Recommended order

Syllabus at a glance