Course Outline

This page lists every topic in the intended order: from welcome and bandits through MDPs, dynamic programming, Monte Carlo, temporal difference, approximation methods, projects, and appendix. Follow this outline for a clear basic-to-advanced path. Each item links to the relevant curriculum chapter, prerequisite, or dedicated page.

Welcome

Topic	Where to find it
Introduction	Home
Course Outline and Big Picture	This page
Where to get the Code	Dedicated page
How to Succeed in this Course	Dedicated page

ML Foundations

Topic	Where to find it
What is Machine Learning?	Dedicated page
Linear Regression	Dedicated page
Gradient Descent	Dedicated page
Logistic Regression	Dedicated page
Model Evaluation	Dedicated page
Decision Trees	Dedicated page
K-Nearest Neighbors	Dedicated page
Datasets and Features	Dedicated page
Bias-Variance Tradeoff	Dedicated page
Intro to sklearn	Dedicated page
Cross-Validation	Dedicated page
Classification Metrics	Dedicated page
Regression Metrics	Dedicated page
ML Mini-Project	Dedicated page
Review & Bridge to DL	Dedicated page
ML Foundations Drills	Dedicated page
Phase 4 Assessment: ML	Assessment

DL Foundations

Topic	Where to find it
Biological Inspiration	Dedicated page
Perceptrons and the XOR Problem	Dedicated page
Multi-Layer Perceptrons	Dedicated page
Activation Functions	Dedicated page
Forward Propagation	Dedicated page
Loss Functions	Dedicated page
Backpropagation	Dedicated page
Optimizers: SGD, Momentum, Adam	Dedicated page
The Training Loop	Dedicated page
Regularization and Overfitting	Dedicated page
CNN Basics: Convolutions and Pooling	Dedicated page
PyTorch: Building Networks with nn.Module	Dedicated page
DL Mini-Project: Digits Classifier	Dedicated page
DL Foundations Drills	Dedicated page
Review & Bridge to RL	Dedicated page
Phase 5 Assessment: DL	Assessment

Warmup — Multi-Armed Bandit

Topic	Where to find it
Section Introduction: The Explore-Exploit Dilemma	Chapter 2: Multi-Armed Bandits
Applications of the Explore-Exploit Dilemma	Chapter 2
Epsilon-Greedy Theory	Chapter 2
Calculating a Sample Mean (pt 1)	Math for RL: Probability
Epsilon-Greedy Beginner’s Exercise Prompt	Chapter 2
Designing Your Bandit Program	Chapter 2
Epsilon-Greedy in Code	Chapter 2
Comparing Different Epsilons	Chapter 2
Optimistic Initial Values Theory	Chapter 2 (hints); Bandits: Optimistic Initial Values
Optimistic Initial Values Beginner’s Exercise Prompt	Bandits: Optimistic Initial Values
Optimistic Initial Values Code	Bandits: Optimistic Initial Values
UCB1 Theory	Dedicated page
UCB1 Beginner’s Exercise Prompt	Bandits: UCB1
UCB1 Code	Bandits: UCB1
Bayesian Bandits / Thompson Sampling Theory (pt 1)	Dedicated page
Bayesian Bandits / Thompson Sampling Theory (pt 2)	Bandits: Thompson Sampling
Thompson Sampling Beginner’s Exercise Prompt	Bandits: Thompson Sampling
Thompson Sampling Code	Bandits: Thompson Sampling
Thompson Sampling With Gaussian Reward Theory	Bandits: Thompson Sampling
Thompson Sampling With Gaussian Reward Code	Bandits: Thompson Sampling
Exercise on Gaussian Rewards	Bandits: Thompson Sampling
Why don’t we just use a library?	Dedicated page
Nonstationary Bandits	Dedicated page
Bandit Summary, Real Data, and Online Learning	Chapter 2; Bandits: Nonstationary
(Optional) Alternative Bandit Designs	Chapter 2

High-Level Overview of Reinforcement Learning

Topic	Where to find it
What is Reinforcement Learning?	Chapter 1
From Bandits to Full Reinforcement Learning	Chapter 1, Chapter 2
Markov Decision Processes	Chapter 3

MDP Section

Topic	Where to find it
MDP Section Introduction	Chapter 3: MDPs
Gridworld	Dedicated page
Choosing Rewards	Dedicated page
The Markov Property	Chapter 3
Markov Decision Processes (MDPs)	Chapter 3
Future Rewards	Chapter 4: Reward Hypothesis, Chapter 5: Value Functions
Value Functions	Chapter 5
The Bellman Equation (pt 1–3)	Chapter 6: The Bellman Equations
Bellman Examples	Chapter 6
Optimal Policy and Optimal Value Function (pt 1–2)	Chapter 6
MDP Summary	Chapter 3 – Chapter 6

Dynamic Programming

Topic	Where to find it
Dynamic Programming Section Introduction	Volume 1
Iterative Policy Evaluation	Chapter 7
Designing Your RL Program	Chapter 7
Gridworld in Code	Dedicated page
Iterative Policy Evaluation in Code	Dedicated page
Windy Gridworld	Dedicated page
Iterative Policy Evaluation for Windy Gridworld	Windy Gridworld
Policy Improvement	Chapter 8: Policy Iteration
Policy Iteration	Chapter 8
Policy Iteration in Code	Chapter 8; DP code walkthrough
Policy Iteration in Windy Gridworld	Windy Gridworld
Value Iteration	Chapter 9
Value Iteration in Code	Chapter 9; DP code walkthrough
Dynamic Programming Summary	Chapter 10: Limitations of DP

Monte Carlo

Topic	Where to find it
Monte Carlo Intro	Chapter 11
Monte Carlo Policy Evaluation	Chapter 11
Monte Carlo Policy Evaluation in Code	Dedicated page
Monte Carlo Control	Chapter 11
Monte Carlo Control in Code	Monte Carlo in Code
Monte Carlo Control without Exploring Starts	Chapter 11; Monte Carlo in Code
Monte Carlo Control without Exploring Starts in Code	Monte Carlo in Code
Monte Carlo Summary	Chapter 11

Temporal Difference Learning

Topic	Where to find it
Temporal Difference Introduction	Chapter 12
TD(0) Prediction	Chapter 12
TD(0) Prediction in Code	Dedicated page
SARSA	Chapter 13
SARSA in Code	TD, SARSA, Q-Learning in Code
Q-Learning	Chapter 14
Q-Learning in Code	TD, SARSA, Q-Learning in Code
TD Learning Section Summary	Chapter 12 – Chapter 14

Approximation Methods

Topic	Where to find it
Approximation Methods Section Introduction	Volume 3
Linear Models for Reinforcement Learning	Chapter 21
Feature Engineering	Dedicated page
Approximation Methods for Prediction	Chapter 21
Approximation Methods for Prediction Code	Chapter 21
Approximation Methods for Control	Chapter 22 – Chapter 30
Approximation Methods for Control Code	Volume 3
CartPole	Dedicated page
CartPole Code	CartPole
Approximation Methods Exercise	Volume 3 chapters
Approximation Methods Section Summary	Volume 3

Interlude: Common Beginner Questions

Topic	Where to find it
This Course vs. RL Book: What’s the Difference?	Dedicated page
Stock Trading Project with Reinforcement Learning	Dedicated section
Beginners, halt! Stop here if you skipped ahead	Stock Trading intro
Stock Trading Project Section Introduction	Stock Trading
Data and Environment	Stock Trading: Data and Environment
How to Model Q for Q-Learning	Stock Trading: How to Model Q
Design of the Program	Stock Trading: Design
Code pt 1–4	Stock Trading
Stock Trading Project Discussion	Stock Trading

Appendix / FAQ

Topic	Where to find it
What is the Appendix?	Appendix index
Setting Up Your Environment	Dedicated page
Pre-Installation Check	Setting Up Your Environment
Anaconda Environment Setup	Dedicated page
How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, TensorFlow	Installing Libraries
How to Code by Yourself (part 1)	Dedicated page
How to Code by Yourself (part 2)	Dedicated page
Proof that using Jupyter Notebook is the same as not using it	Appendix
Python 2 vs Python 3	Prerequisites: Python
Effective Learning Strategies	Dedicated page
How to Succeed in this Course (Long Version)	Dedicated page
Is this for Beginners or Experts? Academic or Practical? Pace	Dedicated page
Machine Learning and AI Prerequisite Roadmap (pt 1–2)	Dedicated page

Part 2 — Advanced (Volumes 4–10)

After the topics above, the curriculum continues with 70 more chapters in order:

Volume	Topics
Volume 4: Policy Gradients	Policy-based methods, REINFORCE, actor-critic, A2C, A3C, DDPG, TD3 (Ch 31–40)
Volume 5: Advanced Policy Optimization	TRPO, PPO, SAC, hyperparameter tuning (Ch 41–50)
Volume 6: Model-Based RL & Planning	World models, MCTS, AlphaZero, Dreamer, MBPO, PETS (Ch 51–60)
Volume 7: Exploration and Meta-Learning	Hard exploration, intrinsic motivation, RND, Go-Explore, MAML, RL² (Ch 61–70)
Volume 8: Offline RL & Imitation Learning	CQL, Decision Transformers, behavioral cloning, IRL, GAIL, RLHF (Ch 71–80)
Volume 9: Multi-Agent RL (MARL)	Game theory, IQL, CTDE, MADDPG, VDN, QMIX, MAPPO (Ch 81–90)
Volume 10: Real-World RL, Safety & LLMs	Robotics, safe RL, trading, recommenders, RLHF for LLMs, evaluation (Ch 91–100)

See the full Curriculum for all 100 chapters.

Welcome#

ML Foundations#

DL Foundations#

Warmup — Multi-Armed Bandit#

High-Level Overview of Reinforcement Learning#

MDP Section#

Dynamic Programming#

Monte Carlo#

Temporal Difference Learning#

Approximation Methods#

Interlude: Common Beginner Questions#

Appendix / FAQ#

Part 2 — Advanced (Volumes 4–10)#