Actor-Critic

Why FA, policy gradient update, DQN exploration, experience replay, and actor-critic — with explanations.

5 quick questions after Chapters 31–35 of Volume 4. Check you're ready to continue.

Sketch two-network actor-critic; pseudocode for TD error updates.

A3C with multiprocessing workers; compare speed with A2C.

Simplified Dreamer: RSSM, imagination phase, actor-critic.

Review Volume 4 (Policy Gradients, Actor-Critic, DDPG, TD3) and preview Volume 5 (PPO, TRPO, SAC — stable, scalable policy optimization).