Phase 3 is Volume 1: Mathematical Foundations and Volume 2: Tabular Methods (chapters 1–20). Use the milestones below to confirm you are on track, then do the mini-project and take the foundations quiz.


Milestone checkpoints


Mini-project: Tabular Q-learning on a 5×5 Gridworld

Goal: Implement tabular Q-learning on a 5×5 gridworld. Plot the value function and policy; report the number of steps to reach the goal (averaged over 100 evaluation episodes) after training.

Specification:

Hints:

Rubric: You have succeeded if (a) the greedy policy from (0,0) reaches (4,4) without obvious loops, and (b) the value at (0,0) is negative and increases as you get closer to the goal.

When you are done, take the Phase 3 foundations quiz.