Linear function approximation, neural networks for RL, Deep Q-Networks (DQN), experience replay, target networks, Double DQN, Dueling DQN, prioritized replay, Noisy Networks, and Rainbow. Chapters 21–30.
Chapter 21: Linear Function Approximation
Learning objectives Represent the action-value function as \(Q(s,a;w) = w^T \phi(s,a)\) with a feature vector \(\phi\). Use tile coding (overlapping grid tilings) to produce binary features for continuous state (e.g. MountainCar). Implement semi-gradient SARSA: update \(w\) using the TD target with current \(Q\) for the next state. Concept and real-world RL Linear function approximation approximates \(Q(s,a) \approx w^T \phi(s,a)\). The weights \(w\) are learned from data; \(\phi(s,a)\) is a fixed or hand-designed feature. Tile coding partitions the state space into overlapping tilings; each tiling is a grid, and the feature vector has a 1 for each tile that contains the state (and the action), so we get a sparse binary vector. This allows generalization across similar states. Semi-gradient methods use the TD target but treat the next-state value as a constant when taking the gradient (no backprop through the target). Linear FA is the simplest form of value approximation and appears in legacy RL and as a baseline. ...