Dynamic Programming
Iterative policy evaluation on 4×4 gridworld.
Dynamic programming, Monte Carlo vs TD, on-policy vs off-policy, and Q-learning — with explanations and examples.
Policy iteration and comparison with value iteration.
Gridworld with wind: actions are shifted by a wind effect. Theory and code for policy evaluation and policy iteration.
Value iteration on 4×4 gridworld, optimal V and policy.
Code walkthrough for gridworld, iterative policy evaluation, and policy iteration.
State and transition count for 10×10 gridworld; function approximation.
BFS planner for gridworld; compare with DP.
15 short drill problems for Volume 1: discounted return, MDPs, value functions, Bellman equations, and dynamic programming.
Review Volume 1 concepts and preview Volume 2. From dynamic programming (model-given) to model-free methods.