This page lists every topic in the intended order: from welcome and bandits through MDPs, dynamic programming, Monte Carlo, temporal difference, approximation methods, projects, and appendix. Follow this outline for a clear basic-to-advanced path. Each item links to the relevant curriculum chapter, prerequisite, or dedicated page.


Welcome

TopicWhere to find it
IntroductionHome
Course Outline and Big PictureThis page
Where to get the CodeDedicated page
How to Succeed in this CourseDedicated page

Warmup — Multi-Armed Bandit

TopicWhere to find it
Section Introduction: The Explore-Exploit DilemmaChapter 2: Multi-Armed Bandits
Applications of the Explore-Exploit DilemmaChapter 2
Epsilon-Greedy TheoryChapter 2
Calculating a Sample Mean (pt 1)Math for RL: Probability
Epsilon-Greedy Beginner’s Exercise PromptChapter 2
Designing Your Bandit ProgramChapter 2
Epsilon-Greedy in CodeChapter 2
Comparing Different EpsilonsChapter 2
Optimistic Initial Values TheoryChapter 2 (hints); Bandits: Optimistic Initial Values
Optimistic Initial Values Beginner’s Exercise PromptBandits: Optimistic Initial Values
Optimistic Initial Values CodeBandits: Optimistic Initial Values
UCB1 TheoryDedicated page
UCB1 Beginner’s Exercise PromptBandits: UCB1
UCB1 CodeBandits: UCB1
Bayesian Bandits / Thompson Sampling Theory (pt 1)Dedicated page
Bayesian Bandits / Thompson Sampling Theory (pt 2)Bandits: Thompson Sampling
Thompson Sampling Beginner’s Exercise PromptBandits: Thompson Sampling
Thompson Sampling CodeBandits: Thompson Sampling
Thompson Sampling With Gaussian Reward TheoryBandits: Thompson Sampling
Thompson Sampling With Gaussian Reward CodeBandits: Thompson Sampling
Exercise on Gaussian RewardsBandits: Thompson Sampling
Why don’t we just use a library?Dedicated page
Nonstationary BanditsDedicated page
Bandit Summary, Real Data, and Online LearningChapter 2; Bandits: Nonstationary
(Optional) Alternative Bandit DesignsChapter 2

High-Level Overview of Reinforcement Learning

TopicWhere to find it
What is Reinforcement Learning?Chapter 1
From Bandits to Full Reinforcement LearningChapter 1, Chapter 2
Markov Decision ProcessesChapter 3

MDP Section

TopicWhere to find it
MDP Section IntroductionChapter 3: MDPs
GridworldDedicated page
Choosing RewardsDedicated page
The Markov PropertyChapter 3
Markov Decision Processes (MDPs)Chapter 3
Future RewardsChapter 4: Reward Hypothesis, Chapter 5: Value Functions
Value FunctionsChapter 5
The Bellman Equation (pt 1–3)Chapter 6: The Bellman Equations
Bellman ExamplesChapter 6
Optimal Policy and Optimal Value Function (pt 1–2)Chapter 6
MDP SummaryChapter 3Chapter 6

Dynamic Programming

TopicWhere to find it
Dynamic Programming Section IntroductionVolume 1
Iterative Policy EvaluationChapter 7
Designing Your RL ProgramChapter 7
Gridworld in CodeDedicated page
Iterative Policy Evaluation in CodeDedicated page
Windy GridworldDedicated page
Iterative Policy Evaluation for Windy GridworldWindy Gridworld
Policy ImprovementChapter 8: Policy Iteration
Policy IterationChapter 8
Policy Iteration in CodeChapter 8; DP code walkthrough
Policy Iteration in Windy GridworldWindy Gridworld
Value IterationChapter 9
Value Iteration in CodeChapter 9; DP code walkthrough
Dynamic Programming SummaryChapter 10: Limitations of DP

Monte Carlo

TopicWhere to find it
Monte Carlo IntroChapter 11
Monte Carlo Policy EvaluationChapter 11
Monte Carlo Policy Evaluation in CodeDedicated page
Monte Carlo ControlChapter 11
Monte Carlo Control in CodeMonte Carlo in Code
Monte Carlo Control without Exploring StartsChapter 11; Monte Carlo in Code
Monte Carlo Control without Exploring Starts in CodeMonte Carlo in Code
Monte Carlo SummaryChapter 11

Temporal Difference Learning

TopicWhere to find it
Temporal Difference IntroductionChapter 12
TD(0) PredictionChapter 12
TD(0) Prediction in CodeDedicated page
SARSAChapter 13
SARSA in CodeTD, SARSA, Q-Learning in Code
Q-LearningChapter 14
Q-Learning in CodeTD, SARSA, Q-Learning in Code
TD Learning Section SummaryChapter 12Chapter 14

Approximation Methods

TopicWhere to find it
Approximation Methods Section IntroductionVolume 3
Linear Models for Reinforcement LearningChapter 21
Feature EngineeringDedicated page
Approximation Methods for PredictionChapter 21
Approximation Methods for Prediction CodeChapter 21
Approximation Methods for ControlChapter 22Chapter 30
Approximation Methods for Control CodeVolume 3
CartPoleDedicated page
CartPole CodeCartPole
Approximation Methods ExerciseVolume 3 chapters
Approximation Methods Section SummaryVolume 3

Interlude: Common Beginner Questions

TopicWhere to find it
This Course vs. RL Book: What’s the Difference?Dedicated page
Stock Trading Project with Reinforcement LearningDedicated section
Beginners, halt! Stop here if you skipped aheadStock Trading intro
Stock Trading Project Section IntroductionStock Trading
Data and EnvironmentStock Trading: Data and Environment
How to Model Q for Q-LearningStock Trading: How to Model Q
Design of the ProgramStock Trading: Design
Code pt 1–4Stock Trading
Stock Trading Project DiscussionStock Trading

Appendix / FAQ

TopicWhere to find it
What is the Appendix?Appendix index
Setting Up Your EnvironmentDedicated page
Pre-Installation CheckSetting Up Your Environment
Anaconda Environment SetupDedicated page
How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, TensorFlowInstalling Libraries
How to Code by Yourself (part 1)Dedicated page
How to Code by Yourself (part 2)Dedicated page
Proof that using Jupyter Notebook is the same as not using itAppendix
Python 2 vs Python 3Prerequisites: Python
Effective Learning StrategiesDedicated page
How to Succeed in this Course (Long Version)Dedicated page
Is this for Beginners or Experts? Academic or Practical? PaceDedicated page
Machine Learning and AI Prerequisite Roadmap (pt 1–2)Dedicated page

Part 2 — Advanced (Volumes 4–10)

After the topics above, the curriculum continues with 70 more chapters in order:

VolumeTopics
Volume 4: Policy GradientsPolicy-based methods, REINFORCE, actor-critic, A2C, A3C, DDPG, TD3 (Ch 31–40)
Volume 5: Advanced Policy OptimizationTRPO, PPO, SAC, hyperparameter tuning (Ch 41–50)
Volume 6: Model-Based RL & PlanningWorld models, MCTS, AlphaZero, Dreamer, MBPO, PETS (Ch 51–60)
Volume 7: Exploration and Meta-LearningHard exploration, intrinsic motivation, RND, Go-Explore, MAML, RL² (Ch 61–70)
Volume 8: Offline RL & Imitation LearningCQL, Decision Transformers, behavioral cloning, IRL, GAIL, RLHF (Ch 71–80)
Volume 9: Multi-Agent RL (MARL)Game theory, IQL, CTDE, MADDPG, VDN, QMIX, MAPPO (Ch 81–90)
Volume 10: Real-World RL, Safety & LLMsRobotics, safe RL, trading, recommenders, RLHF for LLMs, evaluation (Ch 91–100)

See the full Curriculum for all 100 chapters.