Dynamic Programming

Iterative policy evaluation on 4×4 gridworld.

Dynamic programming, Monte Carlo vs TD, on-policy vs off-policy, and Q-learning — with explanations and examples.

Policy iteration and comparison with value iteration.

Gridworld with wind: actions are shifted by a wind effect. Theory and code for policy evaluation and policy iteration.

Value iteration on 4×4 gridworld, optimal V and policy.

Code walkthrough for gridworld, iterative policy evaluation, and policy iteration.

State and transition count for 10×10 gridworld; function approximation.

BFS planner for gridworld; compare with DP.

15 short drill problems for Volume 1: discounted return, MDPs, value functions, Bellman equations, and dynamic programming.

Review Volume 1 concepts and preview Volume 2. From dynamic programming (model-given) to model-free methods.