REINFORCE
Overall Progress
0%
5 quick questions after Chapters 31–35 of Volume 4. Check you're ready to continue.
When a stochastic policy is essential; why deterministic fails.
REINFORCE for CartPole with softmax policy; note variance.
State-value baseline with REINFORCE; compare gradient variance.
Review Volume 4 (Policy Gradients, Actor-Critic, DDPG, TD3) and preview Volume 5 (PPO, TRPO, SAC — stable, scalable policy optimization).