Site archive — full page index for the RL curriculum

2026 ²¹⁸

March ²¹⁸

Deep Reinforcement Learning (module view)

March 24, 2026 · 1 min · 34 words · codefrydev

Biological Inspiration: From Brain Neurons to Artificial Neurons

March 20, 2026 · 5 min · 885 words · codefrydev

What is Machine Learning?

March 20, 2026 · 5 min · 890 words · codefrydev

Datasets and Features

March 20, 2026 · 5 min · 885 words · codefrydev

Statistics for RL

March 20, 2026 · 10 min · 1928 words · codefrydev

The Perceptron: Learning from Mistakes

March 20, 2026 · 5 min · 920 words · codefrydev

Activation Functions: Adding Non-Linearity

March 20, 2026 · 5 min · 873 words · codefrydev

Linear Regression

March 20, 2026 · 5 min · 878 words · codefrydev

Checkpoint: ML Foundations Mid-Point

March 20, 2026 · 2 min · 396 words · codefrydev

Gradient Descent

March 20, 2026 · 5 min · 890 words · codefrydev

Multi-Layer Perceptrons: Stacking Layers to Break Linearity

March 20, 2026 · 4 min · 825 words · codefrydev

Forward Propagation: Computing the Network Output

March 20, 2026 · 5 min · 869 words · codefrydev

Multiple Regression

March 20, 2026 · 5 min · 875 words · codefrydev

Phase 4 Assessment: Machine Learning Foundations

March 20, 2026 · 6 min · 1167 words · codefrydev

Checkpoint: DL Foundations Mid-Point

March 20, 2026 · 3 min · 460 words · codefrydev

Classification Concepts

March 20, 2026 · 4 min · 785 words · codefrydev

Loss Functions: Measuring How Wrong the Network Is

March 20, 2026 · 4 min · 773 words · codefrydev

Backpropagation: Teaching Networks by Propagating Errors

March 20, 2026 · 5 min · 930 words · codefrydev

Logistic Regression

March 20, 2026 · 5 min · 866 words · codefrydev

Phase 5 Assessment: Deep Learning Foundations

March 20, 2026 · 7 min · 1326 words · codefrydev

Model Evaluation

March 20, 2026 · 4 min · 745 words · codefrydev

Optimizers: SGD, Momentum, and Adam

March 20, 2026 · 4 min · 754 words · codefrydev

Cross-Validation and Overfitting

March 20, 2026 · 4 min · 752 words · codefrydev

The Training Loop

March 20, 2026 · 3 min · 624 words · codefrydev

K-Nearest Neighbors

March 20, 2026 · 3 min · 625 words · codefrydev

Regularization and Overfitting

March 20, 2026 · 4 min · 659 words · codefrydev

CNN Basics: Convolutions and Pooling

March 20, 2026 · 4 min · 721 words · codefrydev

Decision Trees

March 20, 2026 · 4 min · 741 words · codefrydev

K-Means Clustering

March 20, 2026 · 4 min · 647 words · codefrydev

PyTorch: Building Neural Networks with nn.Module

March 20, 2026 · 5 min · 988 words · codefrydev

DL Mini-Project: Digits Classifier in NumPy

March 20, 2026 · 3 min · 549 words · codefrydev

Scikit-Learn Workflow

March 20, 2026 · 3 min · 587 words · codefrydev

ML Mini-Project: Wine Classification

March 20, 2026 · 3 min · 628 words · codefrydev

DL Foundations Drills

March 20, 2026 · 5 min · 1023 words · codefrydev

ML Foundations Drills

March 20, 2026 · 5 min · 1043 words · codefrydev

DL Foundations Review & Bridge to RL

March 20, 2026 · 4 min · 819 words · codefrydev

ML Foundations Review & Bridge to Deep Learning

March 20, 2026 · 4 min · 785 words · codefrydev

Phase 0 Assessment: Python Basics

March 19, 2026 · 3 min · 564 words · codefrydev

Python Confidence Builder

March 19, 2026 · 13 min · 2600 words · codefrydev

RL in Plain English

March 19, 2026 · 10 min · 2019 words · codefrydev

Bridge Exercises: Python + Math + RL

March 19, 2026 · 10 min · 1983 words · codefrydev

Checkpoint: Volume 1, Midpoint (After Chapter 5)

March 19, 2026 · 2 min · 304 words · codefrydev

Checkpoint: Volume 2, Midpoint (After Chapter 15)

March 19, 2026 · 3 min · 479 words · codefrydev

How to Debug RL Code

March 19, 2026 · 7 min · 1308 words · codefrydev

Checkpoint: Volume 3, Midpoint (After Chapter 25)

March 19, 2026 · 3 min · 590 words · codefrydev

How to Read RL Papers

March 19, 2026 · 4 min · 851 words · codefrydev

Checkpoint: Volume 4, Midpoint (After Chapter 35)

March 19, 2026 · 3 min · 570 words · codefrydev

Checkpoint: Volume 5, Midpoint (After Chapter 45)

March 19, 2026 · 4 min · 648 words · codefrydev

Phase 8 Assessment: Advanced RL

March 19, 2026 · 6 min · 1195 words · codefrydev

Reinforcement learning glossary — terms, definitions, and chapter links

March 19, 2026 · 13 min · 2585 words · codefrydev

Volume 1 Drills — Mathematical Foundations

March 19, 2026 · 6 min · 1121 words · codefrydev

Volume 2 Drills — Tabular Model-Free Methods

March 19, 2026 · 7 min · 1404 words · codefrydev

Volume 3 Drills — Function Approximation & DQN

March 19, 2026 · 8 min · 1595 words · codefrydev

Volume 1 Review & Bridge to Volume 2

March 19, 2026 · 3 min · 608 words · codefrydev

Volume 2 Review & Bridge to Volume 3

March 19, 2026 · 4 min · 663 words · codefrydev

Volume 3 Review & Bridge to Volume 4

March 19, 2026 · 2 min · 350 words · codefrydev

Volume 4 Review & Bridge to Volume 5

March 19, 2026 · 3 min · 467 words · codefrydev

Volume 5 Review & Bridge to Volume 6

March 19, 2026 · 3 min · 489 words · codefrydev

Volume 6 Review & Bridge to Volume 7

March 19, 2026 · 3 min · 498 words · codefrydev

Volume 7 Review & Bridge to Volume 8

March 19, 2026 · 3 min · 569 words · codefrydev

Volume 8 Review & Bridge to Volume 9

March 19, 2026 · 3 min · 538 words · codefrydev

Volume 9 Review & Bridge to Volume 10

March 19, 2026 · 3 min · 532 words · codefrydev

Chapter 1: The Reinforcement Learning Framework

March 10, 2026 · 5 min · 902 words · codefrydev

Course Outline

March 10, 2026 · 6 min · 1175 words · codefrydev

Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?

March 10, 2026 · 2 min · 378 words · codefrydev

Probability & Statistics

March 10, 2026 · 13 min · 2558 words · codefrydev

Python basics for RL and the preliminary assessment

March 10, 2026 · 5 min · 853 words · codefrydev

Chapter 2: Multi-Armed Bandits

March 10, 2026 · 4 min · 807 words · codefrydev

How to Succeed in this Course (Long Version)

March 10, 2026 · 2 min · 406 words · codefrydev

NumPy

March 10, 2026 · 4 min · 793 words · codefrydev

Phase 1 Self-Check: Math for RL

March 10, 2026 · 5 min · 858 words · codefrydev

Bandits: Optimistic Initial Values

March 10, 2026 · 2 min · 305 words · codefrydev

Chapter 3: Markov Decision Processes (MDPs)

March 10, 2026 · 4 min · 781 words · codefrydev

Effective Learning Strategies for Machine Learning

March 10, 2026 · 2 min · 292 words · codefrydev

Linear Algebra

March 10, 2026 · 13 min · 2571 words · codefrydev

Phase 2 Readiness Quiz

March 10, 2026 · 4 min · 656 words · codefrydev

Probability & Statistics

March 10, 2026 · 5 min · 1062 words · codefrydev

Bandits: UCB1

March 10, 2026 · 2 min · 319 words · codefrydev

Calculus

March 10, 2026 · 11 min · 2332 words · codefrydev

Chapter 4: The Reward Hypothesis

March 10, 2026 · 4 min · 806 words · codefrydev

Gridworld

March 10, 2026 · 2 min · 356 words · codefrydev

Linear Algebra

March 10, 2026 · 5 min · 922 words · codefrydev

Machine Learning and AI Prerequisite Roadmap (pt 1–2)

March 10, 2026 · 2 min · 320 words · codefrydev

Anaconda Environment Setup

March 10, 2026 · 2 min · 237 words · codefrydev

Bandits: Thompson Sampling

March 10, 2026 · 2 min · 401 words · codefrydev

Calculus

March 10, 2026 · 4 min · 793 words · codefrydev

Chapter 5: Value Functions

March 10, 2026 · 4 min · 724 words · codefrydev

Choosing Rewards

March 10, 2026 · 2 min · 354 words · codefrydev

Bandits: Nonstationary

March 10, 2026 · 2 min · 363 words · codefrydev

Chapter 6: The Bellman Equations

March 10, 2026 · 4 min · 688 words · codefrydev

RL Framework

March 10, 2026 · 6 min · 1198 words · codefrydev

Setting Up Your Environment

March 10, 2026 · 2 min · 229 words · codefrydev

Bandits: Why don’t we just use a library?

March 10, 2026 · 2 min · 289 words · codefrydev

Chapter 7: Dynamic Programming — Policy Evaluation

March 10, 2026 · 4 min · 811 words · codefrydev

How to Install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow

March 10, 2026 · 2 min · 279 words · codefrydev

Tabular Methods

March 10, 2026 · 6 min · 1277 words · codefrydev

Chapter 8: Dynamic Programming — Policy Iteration

March 10, 2026 · 4 min · 762 words · codefrydev

How to Code by Yourself (part 1)

March 10, 2026 · 2 min · 312 words · codefrydev

Value Functions and Bellman Equation

March 10, 2026 · 5 min · 906 words · codefrydev

Windy Gridworld

March 10, 2026 · 2 min · 392 words · codefrydev

Chapter 9: Dynamic Programming — Value Iteration

March 10, 2026 · 4 min · 733 words · codefrydev

Dynamic Programming: Gridworld in Code

March 10, 2026 · 2 min · 390 words · codefrydev

Function Approximation and Deep RL

March 10, 2026 · 7 min · 1400 words · codefrydev

How to Code by Yourself (part 2)

March 10, 2026 · 2 min · 346 words · codefrydev

Chapter 10: Limitations of Dynamic Programming

March 10, 2026 · 4 min · 826 words · codefrydev

Phase 6 Assessment: RL Foundations

March 10, 2026 · 5 min · 876 words · codefrydev

Python

March 10, 2026 · 9 min · 1810 words · codefrydev

PyTorch Basics

March 10, 2026 · 5 min · 926 words · codefrydev

Chapter 11: Monte Carlo Methods

March 10, 2026 · 5 min · 895 words · codefrydev

Final Self-Assessment

March 10, 2026 · 3 min · 448 words · codefrydev

Chapter 12: Temporal Difference (TD) Learning

March 10, 2026 · 4 min · 744 words · codefrydev

Monte Carlo in Code

March 10, 2026 · 3 min · 464 words · codefrydev

Chapter 13: SARSA (On-Policy TD Control)

March 10, 2026 · 3 min · 639 words · codefrydev

Phase 7 Assessment: Deep RL

March 10, 2026 · 4 min · 814 words · codefrydev

TD, SARSA, and Q-Learning in Code

March 10, 2026 · 2 min · 351 words · codefrydev

Chapter 14: Q-Learning (Off-Policy TD Control)

March 10, 2026 · 4 min · 700 words · codefrydev

Chapter 15: Expected SARSA

March 10, 2026 · 4 min · 708 words · codefrydev

Chapter 16: N-Step Bootstrapping

March 10, 2026 · 4 min · 653 words · codefrydev

Chapter 17: Planning and Learning with Tabular Methods

March 10, 2026 · 4 min · 686 words · codefrydev

Chapter 18: Custom Gym Environments (Part 1)

March 10, 2026 · 4 min · 663 words · codefrydev

Chapter 19: Hyperparameter Tuning in Tabular RL

March 10, 2026 · 4 min · 713 words · codefrydev

Chapter 20: The Limits of Tabular Methods

March 10, 2026 · 4 min · 761 words · codefrydev

NumPy

March 10, 2026 · 7 min · 1399 words · codefrydev

Chapter 21: Linear Function Approximation

March 10, 2026 · 4 min · 713 words · codefrydev

Feature Engineering for Reinforcement Learning

March 10, 2026 · 2 min · 400 words · codefrydev

CartPole

March 10, 2026 · 3 min · 451 words · codefrydev

Chapter 22: Artificial Neural Networks for RL

March 10, 2026 · 4 min · 655 words · codefrydev

Chapter 23: Deep Q-Networks (DQN)

March 10, 2026 · 4 min · 652 words · codefrydev

Chapter 24: Experience Replay

March 10, 2026 · 4 min · 715 words · codefrydev

Chapter 25: Target Networks

March 10, 2026 · 4 min · 699 words · codefrydev

Chapter 26: Double DQN (DDQN)

March 10, 2026 · 3 min · 630 words · codefrydev

Chapter 27: Dueling DQN

March 10, 2026 · 4 min · 693 words · codefrydev

Chapter 28: Prioritized Experience Replay (PER)

March 10, 2026 · 4 min · 747 words · codefrydev

Chapter 29: Noisy Networks for Exploration

March 10, 2026 · 4 min · 760 words · codefrydev

Chapter 30: Rainbow DQN

March 10, 2026 · 4 min · 693 words · codefrydev

Pandas

March 10, 2026 · 4 min · 764 words · codefrydev

Chapter 31: Introduction to Policy-Based Methods

March 10, 2026 · 4 min · 678 words · codefrydev

Chapter 32: The Policy Objective Function

March 10, 2026 · 4 min · 713 words · codefrydev

Chapter 33: The REINFORCE Algorithm

March 10, 2026 · 4 min · 720 words · codefrydev

Chapter 34: Reducing Variance in Policy Gradients

March 10, 2026 · 4 min · 715 words · codefrydev

Chapter 35: Actor-Critic Architectures

March 10, 2026 · 4 min · 690 words · codefrydev

Visualization & Plotting for RL

March 10, 2026 · 6 min · 1180 words · codefrydev

Chapter 36: Advantage Actor-Critic (A2C)

March 10, 2026 · 4 min · 689 words · codefrydev

Chapter 37: Asynchronous Advantage Actor-Critic (A3C)

March 10, 2026 · 4 min · 673 words · codefrydev

Chapter 38: Continuous Action Spaces

March 10, 2026 · 4 min · 689 words · codefrydev

Chapter 39: Deep Deterministic Policy Gradient (DDPG)

March 10, 2026 · 4 min · 643 words · codefrydev

Chapter 40: Twin Delayed DDPG (TD3)

March 10, 2026 · 4 min · 668 words · codefrydev

Matplotlib

March 10, 2026 · 5 min · 969 words · codefrydev

Chapter 41: The Problem with Standard Policy Gradients

March 10, 2026 · 4 min · 738 words · codefrydev

Chapter 42: Trust Region Policy Optimization (TRPO)

March 10, 2026 · 4 min · 682 words · codefrydev

Chapter 43: Proximal Policy Optimization (PPO): Intuition

March 10, 2026 · 4 min · 666 words · codefrydev

Chapter 44: PPO: Implementation Details

March 10, 2026 · 4 min · 649 words · codefrydev

Chapter 45: Coding PPO from Scratch

March 10, 2026 · 4 min · 656 words · codefrydev

Chapter 46: Maximum Entropy RL

March 10, 2026 · 4 min · 660 words · codefrydev

Chapter 47: Soft Actor-Critic (SAC)

March 10, 2026 · 3 min · 631 words · codefrydev

Chapter 48: SAC vs. PPO

March 10, 2026 · 3 min · 619 words · codefrydev

Chapter 49: Custom Gym Environments (Part 2)

March 10, 2026 · 4 min · 649 words · codefrydev

Chapter 50: Advanced Hyperparameter Tuning

March 10, 2026 · 3 min · 604 words · codefrydev

PyTorch

March 10, 2026 · 5 min · 1052 words · codefrydev

Chapter 51: Model-Free vs. Model-Based RL

March 10, 2026 · 3 min · 552 words · codefrydev

Chapter 52: Learning World Models

March 10, 2026 · 3 min · 542 words · codefrydev

Chapter 53: Planning with Known Models

March 10, 2026 · 3 min · 550 words · codefrydev

Chapter 54: Monte Carlo Tree Search (MCTS)

March 10, 2026 · 3 min · 559 words · codefrydev

Chapter 55: AlphaZero Architecture

March 10, 2026 · 3 min · 563 words · codefrydev

Chapter 56: MuZero Intuition

March 10, 2026 · 3 min · 572 words · codefrydev

Chapter 57: Dreamer and Latent Imagination

March 10, 2026 · 3 min · 571 words · codefrydev

Chapter 58: Model-Based Policy Optimization (MBPO)

March 10, 2026 · 3 min · 584 words · codefrydev

Chapter 59: Probabilistic Ensembles with Trajectory Sampling (PETS)

March 10, 2026 · 3 min · 593 words · codefrydev

Chapter 60: Visualizing Model-Based Rollouts

March 10, 2026 · 3 min · 584 words · codefrydev

TensorFlow

March 10, 2026 · 5 min · 1051 words · codefrydev

Chapter 61: The Hard Exploration Problem

March 10, 2026 · 3 min · 590 words · codefrydev

Chapter 62: Intrinsic Motivation

March 10, 2026 · 3 min · 588 words · codefrydev

Chapter 63: Curiosity-Driven Exploration (ICM)

March 10, 2026 · 4 min · 744 words · codefrydev

Chapter 64: Random Network Distillation (RND)

March 10, 2026 · 4 min · 746 words · codefrydev

Chapter 65: Count-Based Exploration

March 10, 2026 · 4 min · 757 words · codefrydev

Chapter 66: Go-Explore Algorithm

March 10, 2026 · 5 min · 877 words · codefrydev

Chapter 67: Meta-Learning (Learning to Learn)

March 10, 2026 · 4 min · 833 words · codefrydev

Chapter 68: Model-Agnostic Meta-Learning (MAML) in RL

March 10, 2026 · 4 min · 751 words · codefrydev

Chapter 69: RL² (Reinforcement Learning as an RNN)

March 10, 2026 · 4 min · 833 words · codefrydev

Chapter 70: Unsupervised Environment Design

March 10, 2026 · 5 min · 861 words · codefrydev

OpenAI Gym / Gymnasium

March 10, 2026 · 6 min · 1082 words · codefrydev

Chapter 71: The Offline RL Problem

March 10, 2026 · 4 min · 845 words · codefrydev

Chapter 72: Conservative Q-Learning (CQL)

March 10, 2026 · 4 min · 804 words · codefrydev

Chapter 73: Decision Transformers

March 10, 2026 · 4 min · 843 words · codefrydev

Chapter 74: Introduction to Imitation Learning

March 10, 2026 · 4 min · 742 words · codefrydev

Chapter 75: Limitations of Behavioral Cloning

March 10, 2026 · 5 min · 941 words · codefrydev

Chapter 76: Inverse Reinforcement Learning (IRL)

March 10, 2026 · 5 min · 882 words · codefrydev

Chapter 77: Generative Adversarial Imitation Learning (GAIL)

March 10, 2026 · 4 min · 828 words · codefrydev

Chapter 78: Adversarial Motion Priors (AMP)

March 10, 2026 · 4 min · 850 words · codefrydev

Chapter 79: Offline-to-Online Finetuning

March 10, 2026 · 5 min · 881 words · codefrydev

Chapter 80: RL from Human Feedback (RLHF) Basics

March 10, 2026 · 4 min · 851 words · codefrydev

Other Libraries

March 10, 2026 · 4 min · 803 words · codefrydev

Chapter 81: Multi-Agent Fundamentals

March 10, 2026 · 4 min · 796 words · codefrydev

Chapter 82: Game Theory Basics for RL

March 10, 2026 · 4 min · 805 words · codefrydev

Chapter 83: Independent Q-Learning (IQL)

March 10, 2026 · 4 min · 851 words · codefrydev

Chapter 84: Centralized Training, Decentralized Execution (CTDE)

March 10, 2026 · 5 min · 891 words · codefrydev

Chapter 85: Multi-Agent DDPG (MADDPG)

March 10, 2026 · 4 min · 792 words · codefrydev

Chapter 86: Value Decomposition Networks (VDN)

March 10, 2026 · 4 min · 812 words · codefrydev

Chapter 87: QMIX Algorithm

March 10, 2026 · 4 min · 804 words · codefrydev

Chapter 88: Multi-Agent PPO (MAPPO)

March 10, 2026 · 4 min · 811 words · codefrydev

Chapter 89: Self-Play and League Training

March 10, 2026 · 5 min · 887 words · codefrydev

Chapter 90: Communication in MARL

March 10, 2026 · 5 min · 873 words · codefrydev

Chapter 91: RL in Robotics

March 10, 2026 · 4 min · 792 words · codefrydev

Chapter 92: Safe Reinforcement Learning

March 10, 2026 · 4 min · 849 words · codefrydev

Chapter 93: RL for Algorithmic Trading

March 10, 2026 · 4 min · 775 words · codefrydev

Chapter 94: RL in Recommender Systems

March 10, 2026 · 4 min · 826 words · codefrydev

Chapter 95: Training Large Language Models with PPO

March 10, 2026 · 5 min · 878 words · codefrydev

Chapter 96: Implementing RLHF in NLP

March 10, 2026 · 4 min · 851 words · codefrydev

Chapter 97: Direct Preference Optimization (DPO)

March 10, 2026 · 4 min · 823 words · codefrydev

Chapter 98: Evaluating RL Agents

March 10, 2026 · 5 min · 854 words · codefrydev

Chapter 99: Debugging RL Code

March 10, 2026 · 5 min · 865 words · codefrydev

Chapter 100: The Future of Reinforcement Learning

March 10, 2026 · 5 min · 874 words · codefrydev

How to Succeed in this Course

March 10, 2026 · 1 min · 208 words · codefrydev

Real-World Scenarios in This Curriculum

March 10, 2026 · 3 min · 563 words · codefrydev

Stock Trading Project with Reinforcement Learning

March 10, 2026 · 4 min · 717 words · codefrydev

This Course vs. RL Book: What’s the Difference?

March 10, 2026 · 2 min · 405 words · codefrydev

Where to Get the Code

March 10, 2026 · 2 min · 240 words · codefrydev

Worked Solutions Index

March 10, 2026 · 2 min · 285 words · codefrydev

2026 218

March 218

Deep Reinforcement Learning (module view)

Biological Inspiration: From Brain Neurons to Artificial Neurons

What is Machine Learning?

Datasets and Features

Statistics for RL

The Perceptron: Learning from Mistakes

Activation Functions: Adding Non-Linearity

Linear Regression

Checkpoint: ML Foundations Mid-Point

Gradient Descent

Multi-Layer Perceptrons: Stacking Layers to Break Linearity

Forward Propagation: Computing the Network Output

Multiple Regression

Phase 4 Assessment: Machine Learning Foundations

Checkpoint: DL Foundations Mid-Point

Classification Concepts

Loss Functions: Measuring How Wrong the Network Is

Backpropagation: Teaching Networks by Propagating Errors

Logistic Regression

Phase 5 Assessment: Deep Learning Foundations

Model Evaluation

Optimizers: SGD, Momentum, and Adam

Cross-Validation and Overfitting

The Training Loop

K-Nearest Neighbors

Regularization and Overfitting

CNN Basics: Convolutions and Pooling

Decision Trees

K-Means Clustering

PyTorch: Building Neural Networks with nn.Module

DL Mini-Project: Digits Classifier in NumPy

Scikit-Learn Workflow

ML Mini-Project: Wine Classification

DL Foundations Drills

ML Foundations Drills

DL Foundations Review & Bridge to RL

ML Foundations Review & Bridge to Deep Learning

Phase 0 Assessment: Python Basics

Python Confidence Builder

RL in Plain English

Bridge Exercises: Python + Math + RL

Checkpoint: Volume 1, Midpoint (After Chapter 5)

Checkpoint: Volume 2, Midpoint (After Chapter 15)

How to Debug RL Code

Checkpoint: Volume 3, Midpoint (After Chapter 25)

How to Read RL Papers

Checkpoint: Volume 4, Midpoint (After Chapter 35)

Checkpoint: Volume 5, Midpoint (After Chapter 45)

Phase 8 Assessment: Advanced RL

Reinforcement learning glossary — terms, definitions, and chapter links

Volume 1 Drills — Mathematical Foundations

Volume 2 Drills — Tabular Model-Free Methods

Volume 3 Drills — Function Approximation & DQN

Volume 1 Review & Bridge to Volume 2

Volume 2 Review & Bridge to Volume 3

Volume 3 Review & Bridge to Volume 4

Volume 4 Review & Bridge to Volume 5

Volume 5 Review & Bridge to Volume 6

Volume 6 Review & Bridge to Volume 7

Volume 7 Review & Bridge to Volume 8

Volume 8 Review & Bridge to Volume 9

Volume 9 Review & Bridge to Volume 10

Chapter 1: The Reinforcement Learning Framework

Course Outline

Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?

Probability & Statistics

Python basics for RL and the preliminary assessment

Chapter 2: Multi-Armed Bandits

How to Succeed in this Course (Long Version)

NumPy

Phase 1 Self-Check: Math for RL

Bandits: Optimistic Initial Values

Chapter 3: Markov Decision Processes (MDPs)

Effective Learning Strategies for Machine Learning

Linear Algebra

Phase 2 Readiness Quiz

Probability & Statistics

Bandits: UCB1

2026 ²¹⁸

March ²¹⁸