Linear function approximation, neural networks for RL, Deep Q-Networks (DQN), experience replay, target networks, Double DQN, Dueling DQN, prioritized replay, Noisy Networks, and Rainbow. Chapters 21–30.
Chapter 29: Noisy Networks for Exploration
Learning objectives Implement noisy linear layers: \(y = (W + \sigma_W \odot \epsilon_W) x + (b + \sigma_b \odot \epsilon_b)\), where \(\epsilon\) is random noise (e.g. Gaussian) and \(\sigma\) are learnable parameters. Use factorized Gaussian noise to reduce the number of random samples: e.g. \(\epsilon_{i,j} = f(\epsilon_i) \cdot f(\epsilon_j)\) with \(f\) such that the product has zero mean and unit variance. Compare exploration (e.g. unique states visited, or variance of actions over time) with \(\epsilon\)-greedy DQN. Concept and real-world RL ...