Q-Learning

Dynamic programming, Monte Carlo vs TD, on-policy vs off-policy, and Q-learning — with explanations and examples.

10–15 questions on MDPs, Bellman, MC vs TD, SARSA vs Q-learning. Solutions included.

Code walkthrough for TD(0) prediction, SARSA, and Q-learning (tabular).

Q-learning on Cliff Walking; compare with SARSA.

Expected SARSA vs Q-learning; variance and learning curves.

Grid search over α and ε for Q-learning on Cliff Walking.

15 short drill problems for Volume 2: Monte Carlo, TD(0), SARSA, Q-learning, and n-step methods.

Apply Q-learning and function approximation to a simplified stock trading environment—data, Q-model, design, and code.