Monte Carlo and temporal-difference methods, SARSA and Q-learning, n-step bootstrapping, planning with tabular methods, custom Gym environments, and the limits of tabular methods. Chapters 11–20.
Volume 2: Tabular Methods & Classic Algorithms
Chapters 11–20 — Monte Carlo, TD, SARSA, Q-learning, Expected SARSA, n-step, Dyna-Q, custom Gym, hyperparameter tuning.
First-visit MC prediction for blackjack.
TD(0) prediction for blackjack; compare with Monte Carlo.
Code walkthrough for Monte Carlo policy evaluation and Monte Carlo control, with and without exploring starts.
SARSA on Cliff Walking; plot sum of rewards per episode.
Code walkthrough for TD(0) prediction, SARSA, and Q-learning (tabular).
Q-learning on Cliff Walking; compare with SARSA.
Expected SARSA vs Q-learning; variance and learning curves.
n-step SARSA (n=4) on Cliff Walking.
Dyna-Q on 4×4 deterministic gridworld.
Custom 2D maze Gym env with text render.
Grid search over α and ε for Q-learning on Cliff Walking.
Memory for Backgammon Q-table; necessity of function approximation.
15 short drill problems for Volume 2: Monte Carlo, TD(0), SARSA, Q-learning, and n-step methods.
Review Volume 2 tabular methods and preview Volume 3. From Q-tables to neural network function approximation.