Volume 2: Tabular Methods & Classic Algorithms

Learning objectives Run a grid search over learning rate \(\alpha\) and exploration \(\epsilon\) for Q-learning. Aggregate results over multiple trials (e.g. mean reward per episode) and visualize with a heatmap. Interpret which hyperparameter combinations work best and why. Concept and real-world RL Hyperparameters (e.g. \(\alpha\), \(\epsilon\), \(\gamma\)) strongly affect learning speed and final performance. Grid search tries every combination in a predefined set; it is simple but costly when there are many parameters. In practice, RL tuning often uses grid search for 2–3 key parameters, or Bayesian optimization / bandit-based tuning for larger spaces. Reporting mean and std over multiple seeds is essential because RL is noisy. Heatmaps (e.g. \(\alpha\) vs \(\epsilon\) with color = mean reward) make good and bad regions visible at a glance. ...

Volume 2: Tabular Methods & Classic Algorithms

Chapter 19: Hyperparameter Tuning in Tabular RL

Chapter 20: The Limits of Tabular Methods