Preliminary
Functions, lists, loops, and list comprehensions โ with RL-relevant examples and explained solutions.
Arrays, indexing, slicing, and element-wise vs matrix operations โ with RL-relevant examples and explanations.
Sample mean, variance, expectation, and law of large numbers โ with bandit-style problems and explained solutions.
Vectors, dot product, matrix-vector product, and gradients โ with RL motivation and explained solutions.
Derivatives, chain rule, sigmoid and softmax โ with RL motivation and explained solutions.
Agent, environment, state, action, reward, Markov property, exploration-exploitation, and discount factor โ with explanations.
Dynamic programming, Monte Carlo vs TD, on-policy vs off-policy, and Q-learning โ with explanations and examples.
V^ฯ(s), Q^ฯ(s,a), and the Bellman expectation equation โ with worked examples and explanations.
Why FA, policy gradient update, DQN exploration, experience replay, and actor-critic โ with explanations.
Tensors, requires_grad, backward, and autograd โ with RL-relevant examples and explanations.
Reflect on your readiness across math, Python, NumPy, PyTorch, and RL concepts before starting the curriculum.