Preliminary

Functions, lists, loops, and list comprehensions — with RL-relevant examples and explained solutions.

Arrays, indexing, slicing, and element-wise vs matrix operations — with RL-relevant examples and explanations.

Sample mean, variance, expectation, and law of large numbers — with bandit-style problems and explained solutions.

Vectors, dot product, matrix-vector product, and gradients — with RL motivation and explained solutions.

Derivatives, chain rule, sigmoid and softmax — with RL motivation and explained solutions.

Agent, environment, state, action, reward, Markov property, exploration-exploitation, and discount factor — with explanations.

Dynamic programming, Monte Carlo vs TD, on-policy vs off-policy, and Q-learning — with explanations and examples.

V^π(s), Q^π(s,a), and the Bellman expectation equation — with worked examples and explanations.

Why FA, policy gradient update, DQN exploration, experience replay, and actor-critic — with explanations.

Tensors, requires_grad, backward, and autograd — with RL-relevant examples and explanations.

Reflect on your readiness across math, Python, NumPy, PyTorch, and RL concepts before starting the curriculum.