Volume 3: Value Function Approximation & Deep Q-Learning

Chapters 21–30 — Linear FA, neural nets for RL, DQN, replay, target networks, DDQN, Dueling, PER, NoisyNet, Rainbow.

Overall Progress 0%

Linear FA with tile coding for MountainCar; semi-gradient SARSA.

Go to Chapter 21: Linear Function Approximation →

Designing state and state-action features for linear value approximation.

Go to Feature Engineering for Reinforcement Learning →

The CartPole (Inverted Pendulum) environment: state, actions, and solving it with value-based or policy-based methods.

Go to CartPole →

Two-hidden-layer PyTorch network for Q-values; MSE loss.

Go to Chapter 22: Artificial Neural Networks for RL →

DQN for CartPole with replay and target network.

Go to Chapter 23: Deep Q-Networks (DQN) →

Replay buffer class with push and sample.

Go to Chapter 24: Experience Replay →

Hard vs soft target updates in DQN.

Go to Chapter 25: Target Networks →

Double DQN: online selects, target evaluates; compare with DQN.

Go to Chapter 26: Double DQN (DDQN) →

Dueling architecture V(s) + A(s,a); compare with DQN.

Go to Chapter 27: Dueling DQN →

Sum-tree prioritized buffer with TD error; importance-sampling weights.

Go to Chapter 28: Prioritized Experience Replay (PER) →

Noisy linear layers with factorized Gaussian; compare with ε-greedy.

Go to Chapter 29: Noisy Networks for Exploration →

Combine DDQN, Dueling, PER, NoisyNet, multi-step; train on Pong.

Go to Chapter 30: Rainbow DQN →

15 short drill problems for Volume 3: linear FA, semi-gradient TD, DQN, replay buffer, target network, Double DQN, and dueling networks.

Go to Volume 3 Drills — Function Approximation & DQN →

Review Volume 3 (DQN and variants) and preview Volume 4 (Policy Gradients). From value-based to policy-based methods.

Go to Volume 3 Review & Bridge to Volume 4 →

Linear function approximation, neural networks for RL, Deep Q-Networks (DQN), experience replay, target networks, Double DQN, Dueling DQN, prioritized replay, Noisy Networks, and Rainbow. Chapters 21–30.