Learning objectives
- Use spaced practice and active recall to retain concepts.
- Combine reading with coding and small projects.
- Avoid common traps (passive watching, skipping exercises).
Spaced practice
Do not cram. Spread your study over time: e.g. 30–60 minutes per day on the curriculum, rather than one long session per week. Revisit earlier chapters when you hit later material (e.g. when you do TD, recall what you did in Monte Carlo). Spacing strengthens long-term retention.
Active recall
After reading a section, close the page and ask yourself:
- What is the update rule for Q-learning?
- What is the difference between SARSA and Q-learning?
- Why do we use a step size in nonstationary bandits?
Answer in your own words or write a short note. Then check the text. This is more effective than re-reading passively.
Code and implement
Reading is not enough. Implement the algorithms: bandits, policy evaluation, Q-learning. Run the code, break it, fix it. The exercises and worked solutions are there so you can try first and then verify. Implementation forces you to resolve ambiguities and builds intuition.
Project-based learning
Tie concepts to a small project: e.g. the Stock Trading Project or a custom gridworld. Projects force you to integrate multiple ideas (environment, reward, algorithm, evaluation) and make the material stick.
Avoid these traps
- Only watching or reading: Without coding, you will forget quickly and will not be able to implement later.
- Skipping prerequisites: If the math or Python feels shaky, do the Math for RL and Prerequisites first. Rushing ahead leads to confusion.
- Comparing yourself to others: Everyone learns at a different pace. Focus on your own progress and completion of exercises.
See How to Succeed in this Course (Long Version) and the Learning path for a structured route through the material.