Chapter 22: Artificial Neural Networks for RL
Learning objectives Build a feedforward neural network that maps state to Q-values (one output per action) in PyTorch. Implement the forward pass and an MSE loss between predicted Q-values and targets. Understand how this network will be used in DQN (next chapter): TD target and gradient update. Concept and real-world RL Neural networks as function approximators let us represent \(Q(s,a)\) (or \(Q(s)\) with one output per action) for high-dimensional or continuous state spaces. The network takes the state (and optionally the action) as input and outputs values; we train it by minimizing TD error (e.g. MSE between predicted Q and target \(r + \gamma \max_{a’} Q(s’,a’)\)). This is the core of Deep Q-Networks (DQN) and many other deep RL algorithms. In practice, we use MLPs for low-dim state (e.g. CartPole) and CNNs for images (e.g. Atari). ...