REINFORCE

5 quick questions after Chapters 31–35 of Volume 4. Check you're ready to continue.

When a stochastic policy is essential; why deterministic fails.

REINFORCE for CartPole with softmax policy; note variance.

State-value baseline with REINFORCE; compare gradient variance.

Review Volume 4 (Policy Gradients, Actor-Critic, DDPG, TD3) and preview Volume 5 (PPO, TRPO, SAC — stable, scalable policy optimization).