Volume 2: Tabular Methods & Classic Algorithms

Chapters 11–20 — Monte Carlo, TD, SARSA, Q-learning, Expected SARSA, n-step, Dyna-Q, custom Gym, hyperparameter tuning.

Overall Progress 0%

First-visit MC prediction for blackjack.

Go to Chapter 11: Monte Carlo Methods →

TD(0) prediction for blackjack; compare with Monte Carlo.

Go to Chapter 12: Temporal Difference (TD) Learning →

Code walkthrough for Monte Carlo policy evaluation and Monte Carlo control, with and without exploring starts.

Go to Monte Carlo in Code →

SARSA on Cliff Walking; plot sum of rewards per episode.

Go to Chapter 13: SARSA (On-Policy TD Control) →

Code walkthrough for TD(0) prediction, SARSA, and Q-learning (tabular).

Go to TD, SARSA, and Q-Learning in Code →

Q-learning on Cliff Walking; compare with SARSA.

Go to Chapter 14: Q-Learning (Off-Policy TD Control) →

Expected SARSA vs Q-learning; variance and learning curves.

Go to Chapter 15: Expected SARSA →

n-step SARSA (n=4) on Cliff Walking.

Go to Chapter 16: N-Step Bootstrapping →

Dyna-Q on 4×4 deterministic gridworld.

Go to Chapter 17: Planning and Learning with Tabular Methods →

Custom 2D maze Gym env with text render.

Go to Chapter 18: Custom Gym Environments (Part 1) →

Grid search over α and ε for Q-learning on Cliff Walking.

Go to Chapter 19: Hyperparameter Tuning in Tabular RL →

Memory for Backgammon Q-table; necessity of function approximation.

Go to Chapter 20: The Limits of Tabular Methods →

15 short drill problems for Volume 2: Monte Carlo, TD(0), SARSA, Q-learning, and n-step methods.

Go to Volume 2 Drills — Tabular Model-Free Methods →

Review Volume 2 tabular methods and preview Volume 3. From Q-tables to neural network function approximation.

Go to Volume 2 Review & Bridge to Volume 3 →

Monte Carlo and temporal-difference methods, SARSA and Q-learning, n-step bootstrapping, planning with tabular methods, custom Gym environments, and the limits of tabular methods. Chapters 11–20.