Learning Path
Anchor scenarios used throughout the learning path and curriculum to ground RL concepts in practice.
Install Python, run your first script, and learn variables, conditionals, loops, and functions before RL.
Probability, statistics, linear algebra, and calculus with RL-motivated examples. Read in order: 1a β 1d β self-check.
Python, NumPy, PyTorch, Gym/Gymnasium, and related tools the curriculum assumes. Complete tasks on the prerequisites index, then the Phase 2 quiz.
Deeper pass through the same math areas as Phase 1, with more drills and RL-motivated examples before you start the core RL volumes.
Supervised learning, regression, classification, gradient descent, and evaluationβbefore neural networks for RL.
Neural networks, backpropagation, CNNs, PyTorch patterns, and a mini-projectβdirectly reusable for DQN, policies, and actor-critic.
Volumes 1β2: MDPs, dynamic programming, Monte Carlo, TD, SARSA, and tabular Q-learning. Core theory before function approximation.
Volumes 3β5: value function approximation, DQN family, policy gradients, actor-critic, and advanced policy optimization (chapters 21β50).
Volumes 6β10: model-based RL, exploration, offline RL, MARL, real-world RL, safety, and RL with LLMs (chapters 51β100).