Linear function approximation, neural networks for RL, Deep Q-Networks (DQN), experience replay, target networks, Double DQN, Dueling DQN, prioritized replay, Noisy Networks, and Rainbow. Chapters 21–30.
Volume 3: Value Function Approximation & Deep Q-Learning
Chapters 21–30 — Linear FA, neural nets for RL, DQN, replay, target networks, DDQN, Dueling, PER, NoisyNet, Rainbow.
Linear FA with tile coding for MountainCar; semi-gradient SARSA.
Designing state and state-action features for linear value approximation.
The CartPole (Inverted Pendulum) environment: state, actions, and solving it with value-based or policy-based methods.
Two-hidden-layer PyTorch network for Q-values; MSE loss.
DQN for CartPole with replay and target network.
Replay buffer class with push and sample.
Hard vs soft target updates in DQN.
Double DQN: online selects, target evaluates; compare with DQN.
Dueling architecture V(s) + A(s,a); compare with DQN.
Sum-tree prioritized buffer with TD error; importance-sampling weights.
Noisy linear layers with factorized Gaussian; compare with ε-greedy.
Combine DDQN, Dueling, PER, NoisyNet, multi-step; train on Pong.
15 short drill problems for Volume 3: linear FA, semi-gradient TD, DQN, replay buffer, target network, Double DQN, and dueling networks.
Review Volume 3 (DQN and variants) and preview Volume 4 (Policy Gradients). From value-based to policy-based methods.