Problems with standard policy gradients, TRPO, PPO (intuition and implementation), GAE, maximum entropy RL, Soft Actor-Critic (SAC), SAC vs PPO, custom continuous environments, and advanced tuning. Chapters 41–50.