Chapter 19: Hyperparameter Tuning in Tabular RL

Learning objectives Run a grid search over learning rate \(\alpha\) and exploration \(\epsilon\) for Q-learning. Aggregate results over multiple trials (e.g. mean reward per episode) and visualize with a heatmap. Interpret which hyperparameter combinations work best and why. Concept and real-world RL Hyperparameters (e.g. \(\alpha\), \(\epsilon\), \(\gamma\)) strongly affect learning speed and final performance. Grid search tries every combination in a predefined set; it is simple but costly when there are many parameters. In practice, RL tuning often uses grid search for 2–3 key parameters, or Bayesian optimization / bandit-based tuning for larger spaces. Reporting mean and std over multiple seeds is essential because RL is noisy. Heatmaps (e.g. \(\alpha\) vs \(\epsilon\) with color = mean reward) make good and bad regions visible at a glance. ...

March 10, 2026 · 3 min · 608 words · codefrydev