Learning objectives
- Know whether this course is aimed at beginners or experts (and how it serves both).
- Understand the balance between academic theory and practical implementation.
- Choose a pace that fits your background and goals.
Beginners or experts?
Beginners (to RL or programming): The course is designed to be accessible. Start with the Learning path Phase 0 if you have no programming. Use Math for RL and Prerequisites before or alongside Volume 1. The Preliminary assessment helps you check readiness. Exercises have hints and worked solutions so you can learn step by step. You do not need a PhD or prior ML course—but you do need to invest time in prerequisites if your background is light.
Experts (ML/RL experience): You can move quickly through Volumes 1–2 (review bandits, MDPs, DP, MC, TD) and spend more time on Volumes 3–10 (approximation, deep RL, model-based, offline, MARL, etc.). Use the Course outline to jump to topics you need. The 100 chapters and exercises still serve as a structured reference and implementation check.
So: both. Beginners follow the path from zero; experts use it as a fast or targeted refresher and for the advanced volumes.
Academic or practical?
Academic: The curriculum covers the theory (MDPs, Bellman equations, convergence, policy gradient theorems) and links to the standard book (Sutton & Barto). You will see math and derivations where they matter.
Practical: Every chapter has an exercise and code: implement bandits, gridworld, Q-learning, DQN, etc. The course emphasizes implementation, environments (Gym), and runnable code. So it is both—theory with practice. If you want only theory, read the book; if you want only recipes, you might miss the “why.” This course combines both.
Fast or slow-paced?
You set the pace. There are no deadlines. Suggested ranges:
- Fast (already strong in math and Python): Volumes 1–2 in a few weeks; Volumes 3–5 in 1–2 months; then pick from Volumes 6–10 as needed.
- Steady (some background): A few months for Volumes 1–3; several more for 4–5 and selected advanced topics.
- Slow (building from zero): Start with Learning path Phases 0–2 (programming, math, prerequisites), then 3–6 months for Volumes 1–3, and more for the rest.
Consistency (e.g. 30–60 min per day) matters more than total hours per week. See How to Succeed (Long Version) for more detail.