This learning path takes you from zero programming experience to understanding and building reinforcement learning systems. Follow the phases in order; each phase builds on the previous one. The order matches the Course outline (basic to advanced).

Not ready for the Preliminary assessment? If you have never programmed, start with Phase 0. If the assessment feels hard, follow this learning path in order and return to it when you are ready.


Phase 0 — Programming from zero

For: Anyone with no prior coding experience.

Duration: About 2–4 weeks (at a few hours per week).

What you will do: Install Python, run your first script, and learn variables, types, conditionals, loops, and functions.

Outcomes:

In RL, this leads to: Every RL implementation is code. You will write loops over episodes, conditionals for exploration vs. exploitation, and functions for environments and agents.

Start here: Phase 0: Programming from zero


Phase 1 — Math foundations for RL

For: Learners who can write basic Python (or have finished Phase 0 and the Python prerequisite) and want to solidify the math used in RL.

Duration: About 2–4 weeks.

What you will do: Study probability & statistics, linear algebra, and calculus with RL-motivated examples and practice. Work through the sub-phases in order, then take the self-check.

Sub-phases:

Outcomes:

In RL, this leads to: Value functions are expectations; states and observations are vectors; policy gradients use calculus. Solid math makes every chapter easier.

Start here: Math for RL → then Phase 1 self-check


Phase 2 — Prerequisites (tools and libraries)

For: Learners who have basic programming (and ideally some math) and are ready to use the stack the curriculum assumes.

Duration: About 3–6 weeks, depending on how much you already know.

What you will do: Work through Python (full), NumPy, Pandas, Matplotlib, PyTorch, TensorFlow, and Gym as needed. Each prerequisite page explains why RL needs it; complete the one small task per topic listed on the Prerequisites index, then take the Phase 2 readiness quiz.

Outcomes:

In RL, this leads to: The curriculum exercises assume this stack. Prerequisites include Professor’s hints and common pitfalls to avoid mistakes.

Start here: PrerequisitesPhase 2 readiness quiz


Phase 3 — RL foundations

For: Learners who have completed (or tested out of) Phases 0–2 and are ready for the core RL curriculum.

Duration: About 4–8 weeks.

What you will do: Complete Volume 1: Mathematical Foundations and Volume 2: Tabular Methods & Classic Algorithms (chapters 1–20). Use the milestone checkpoints and mini-project (tabular Q-learning on a 5×5 Gridworld) on the Phase 3 page, then take the Phase 3 foundations quiz.

Outcomes:

In RL, this leads to: Everything that follows (DQN, policy gradients, etc.) builds on these ideas. Do not skip this phase.

Start here: Volume 1: Mathematical FoundationsPhase 3 milestones & mini-projectPhase 3 foundations quiz


Phase 4 — Deep RL

For: Learners who have finished Volumes 1–2 and want to scale to large or continuous state spaces.

Duration: About 6–12 weeks.

What you will do: Complete Volume 3: Value Function Approximation & Deep Q-Learning, Volume 4: Policy Gradients, and Volume 5: Advanced Policy Optimization (chapters 21–50).

Outcomes:

In RL, this leads to: Most practical applications use deep RL. This phase is where you go from “understanding the theory” to “building agents that work in complex environments.”

Start here: Volume 3: Value Function Approximation & Deep Q-LearningPhase 4 milestones & coding challengesPhase 4 Deep RL quiz


Phase 5 — Advanced topics

For: Learners who have completed Phases 3–4 and want to go deeper.

Duration: Ongoing (pick topics as needed).

What you will do: Work through Volumes 6–10 (chapters 51–100): model-based RL, exploration, offline RL, multi-agent RL, real-world applications, safety, and RL with large language models. Each volume has a topic roadmap (what you will learn per chapter); use it to pick a path. An optional Phase 5 project (e.g. offline RL on a fixed dataset, or a simple multi-agent scenario) ties concepts together.

Topic roadmaps (after this you will…):

Optional project: Implement offline RL on a fixed dataset (e.g. from a random or expert policy) using CQL or conservative Q-learning; or implement a simple two-agent cooperative task with parameter sharing (MAPPO or IQL).

Outcomes:

In RL, this leads to: Research and industry applications. Use the curriculum as a map and dive into the areas that interest you most.

Start here: Volume 6: Model-Based RL & Planning


Quick reference

PhaseContentDuration (approx.)
0Programming from zero2–4 weeks
1Math for RL2–4 weeks
2Prerequisites3–6 weeks
3Volume 1 + Volume 24–8 weeks
4Volumes 3–56–12 weeks
5Volumes 6–10Ongoing

Good luck on your journey from zero to mastery.