Chapter 50: Advanced Hyperparameter Tuning
Learning objectives Use Weights & Biases (or similar) to run a hyperparameter sweep for SAC on your custom environment (or a standard one). Sweep over learning rate, entropy coefficient (or auto-\(\alpha\) target), and network size (hidden dims). Visualize the effect on final return and learning speed (e.g. steps to reach a threshold). Concept and real-world RL Hyperparameter tuning is essential for getting the best from RL algorithms; sweeps (grid or random search over learning rate, network size, etc.) are standard in research and industry. Weights & Biases (wandb) logs metrics and supports sweep configs; similar tools include MLflow, Optuna, and Ray Tune. In robot control and game AI, tuning learning rate and entropy (or clip range for PPO) often has the largest impact. Automating sweeps saves time and makes results reproducible. ...