Chapter 19: Hyperparameter Tuning in Tabular RL

Learning objectives Run a grid search over learning rate \(\alpha\) and exploration \(\epsilon\) for Q-learning. Aggregate results over multiple trials (e.g. mean reward per episode) and visualize with a heatmap. Interpret which hyperparameter combinations work best and why. Concept and real-world RL Hyperparameters (e.g. \(\alpha\), \(\epsilon\), \(\gamma\)) strongly affect learning speed and final performance. Grid search tries every combination in a predefined set; it is simple but costly when there are many parameters. In practice, RL tuning often uses grid search for 2–3 key parameters, or Bayesian optimization / bandit-based tuning for larger spaces. Reporting mean and std over multiple seeds is essential because RL is noisy. Heatmaps (e.g. \(\alpha\) vs \(\epsilon\) with color = mean reward) make good and bad regions visible at a glance. ...

March 10, 2026 · 3 min · 608 words · codefrydev

Chapter 50: Advanced Hyperparameter Tuning

Learning objectives Use Weights & Biases (or similar) to run a hyperparameter sweep for SAC on your custom environment (or a standard one). Sweep over learning rate, entropy coefficient (or auto-\(\alpha\) target), and network size (hidden dims). Visualize the effect on final return and learning speed (e.g. steps to reach a threshold). Concept and real-world RL Hyperparameter tuning is essential for getting the best from RL algorithms; sweeps (grid or random search over learning rate, network size, etc.) are standard in research and industry. Weights & Biases (wandb) logs metrics and supports sweep configs; similar tools include MLflow, Optuna, and Ray Tune. In robot control and game AI, tuning learning rate and entropy (or clip range for PPO) often has the largest impact. Automating sweeps saves time and makes results reproducible. ...

March 10, 2026 · 3 min · 473 words · codefrydev