From the reinforcement learning framework and multi-armed bandits through MDPs, value functions, Bellman equations, and dynamic programming (policy evaluation, policy iteration, value iteration). Chapters 1–10.