This page lists every topic in the intended order: from welcome and bandits through MDPs, dynamic programming, Monte Carlo, temporal difference, approximation methods, projects, and appendix. Follow this outline for a clear basic-to-advanced path. Each item links to the relevant curriculum chapter, prerequisite, or dedicated page.
Welcome Topic Where to find it Introduction Home Course Outline and Big Picture This page Where to get the Code Dedicated page How to Succeed in this Course Dedicated page Warmup — Multi-Armed Bandit Topic Where to find it Section Introduction: The Explore-Exploit Dilemma Chapter 2: Multi-Armed Bandits Applications of the Explore-Exploit Dilemma Chapter 2 Epsilon-Greedy Theory Chapter 2 Calculating a Sample Mean (pt 1) Math for RL: Probability Epsilon-Greedy Beginner’s Exercise Prompt Chapter 2 Designing Your Bandit Program Chapter 2 Epsilon-Greedy in Code Chapter 2 Comparing Different Epsilons Chapter 2 Optimistic Initial Values Theory Chapter 2 (hints); Bandits: Optimistic Initial Values Optimistic Initial Values Beginner’s Exercise Prompt Bandits: Optimistic Initial Values Optimistic Initial Values Code Bandits: Optimistic Initial Values UCB1 Theory Dedicated page UCB1 Beginner’s Exercise Prompt Bandits: UCB1 UCB1 Code Bandits: UCB1 Bayesian Bandits / Thompson Sampling Theory (pt 1) Dedicated page Bayesian Bandits / Thompson Sampling Theory (pt 2) Bandits: Thompson Sampling Thompson Sampling Beginner’s Exercise Prompt Bandits: Thompson Sampling Thompson Sampling Code Bandits: Thompson Sampling Thompson Sampling With Gaussian Reward Theory Bandits: Thompson Sampling Thompson Sampling With Gaussian Reward Code Bandits: Thompson Sampling Exercise on Gaussian Rewards Bandits: Thompson Sampling Why don’t we just use a library? Dedicated page Nonstationary Bandits Dedicated page Bandit Summary, Real Data, and Online Learning Chapter 2; Bandits: Nonstationary (Optional) Alternative Bandit Designs Chapter 2 High-Level Overview of Reinforcement Learning Topic Where to find it What is Reinforcement Learning? Chapter 1 From Bandits to Full Reinforcement Learning Chapter 1, Chapter 2 Markov Decision Processes Chapter 3 MDP Section Topic Where to find it MDP Section Introduction Chapter 3: MDPs Gridworld Dedicated page Choosing Rewards Dedicated page The Markov Property Chapter 3 Markov Decision Processes (MDPs) Chapter 3 Future Rewards Chapter 4: Reward Hypothesis, Chapter 5: Value Functions Value Functions Chapter 5 The Bellman Equation (pt 1–3) Chapter 6: The Bellman Equations Bellman Examples Chapter 6 Optimal Policy and Optimal Value Function (pt 1–2) Chapter 6 MDP Summary Chapter 3 – Chapter 6 Dynamic Programming Topic Where to find it Dynamic Programming Section Introduction Volume 1 Iterative Policy Evaluation Chapter 7 Designing Your RL Program Chapter 7 Gridworld in Code Dedicated page Iterative Policy Evaluation in Code Dedicated page Windy Gridworld Dedicated page Iterative Policy Evaluation for Windy Gridworld Windy Gridworld Policy Improvement Chapter 8: Policy Iteration Policy Iteration Chapter 8 Policy Iteration in Code Chapter 8; DP code walkthrough Policy Iteration in Windy Gridworld Windy Gridworld Value Iteration Chapter 9 Value Iteration in Code Chapter 9; DP code walkthrough Dynamic Programming Summary Chapter 10: Limitations of DP Monte Carlo Topic Where to find it Monte Carlo Intro Chapter 11 Monte Carlo Policy Evaluation Chapter 11 Monte Carlo Policy Evaluation in Code Dedicated page Monte Carlo Control Chapter 11 Monte Carlo Control in Code Monte Carlo in Code Monte Carlo Control without Exploring Starts Chapter 11; Monte Carlo in Code Monte Carlo Control without Exploring Starts in Code Monte Carlo in Code Monte Carlo Summary Chapter 11 Temporal Difference Learning Topic Where to find it Temporal Difference Introduction Chapter 12 TD(0) Prediction Chapter 12 TD(0) Prediction in Code Dedicated page SARSA Chapter 13 SARSA in Code TD, SARSA, Q-Learning in Code Q-Learning Chapter 14 Q-Learning in Code TD, SARSA, Q-Learning in Code TD Learning Section Summary Chapter 12 – Chapter 14 Approximation Methods Topic Where to find it Approximation Methods Section Introduction Volume 3 Linear Models for Reinforcement Learning Chapter 21 Feature Engineering Dedicated page Approximation Methods for Prediction Chapter 21 Approximation Methods for Prediction Code Chapter 21 Approximation Methods for Control Chapter 22 – Chapter 30 Approximation Methods for Control Code Volume 3 CartPole Dedicated page CartPole Code CartPole Approximation Methods Exercise Volume 3 chapters Approximation Methods Section Summary Volume 3 Interlude: Common Beginner Questions Topic Where to find it This Course vs. RL Book: What’s the Difference? Dedicated page Stock Trading Project with Reinforcement Learning Dedicated section Beginners, halt! Stop here if you skipped ahead Stock Trading intro Stock Trading Project Section Introduction Stock Trading Data and Environment Stock Trading: Data and Environment How to Model Q for Q-Learning Stock Trading: How to Model Q Design of the Program Stock Trading: Design Code pt 1–4 Stock Trading Stock Trading Project Discussion Stock Trading Appendix / FAQ Topic Where to find it What is the Appendix? Appendix index Setting Up Your Environment Dedicated page Pre-Installation Check Setting Up Your Environment Anaconda Environment Setup Dedicated page How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, TensorFlow Installing Libraries How to Code by Yourself (part 1) Dedicated page How to Code by Yourself (part 2) Dedicated page Proof that using Jupyter Notebook is the same as not using it Appendix Python 2 vs Python 3 Prerequisites: Python Effective Learning Strategies Dedicated page How to Succeed in this Course (Long Version) Dedicated page Is this for Beginners or Experts? Academic or Practical? Pace Dedicated page Machine Learning and AI Prerequisite Roadmap (pt 1–2) Dedicated page Part 2 — Advanced (Volumes 4–10) After the topics above, the curriculum continues with 70 more chapters in order:
...