Q-Learning
Overall Progress
0%
Dynamic programming, Monte Carlo vs TD, on-policy vs off-policy, and Q-learning — with explanations and examples.
10–15 questions on MDPs, Bellman, MC vs TD, SARSA vs Q-learning. Solutions included.
Code walkthrough for TD(0) prediction, SARSA, and Q-learning (tabular).
Q-learning on Cliff Walking; compare with SARSA.
Expected SARSA vs Q-learning; variance and learning curves.
Grid search over α and ε for Q-learning on Cliff Walking.
15 short drill problems for Volume 2: Monte Carlo, TD(0), SARSA, Q-learning, and n-step methods.
Apply Q-learning and function approximation to a simplified stock trading environment—data, Q-model, design, and code.