Chapter 31: Introduction to Policy-Based Methods
Learning objectives Explain when a stochastic policy (outputting a distribution over actions) is essential versus when a deterministic policy suffices. Give a real-world scenario where a deterministic policy would fail (e.g. games with hidden information, adversarial settings). Relate stochastic policies to exploration and to game AI or recommendation where diversity matters. Concept and real-world RL Policy-based methods directly parameterize and optimize the policy \(\pi(a|s;\theta)\) instead of learning a value function and deriving actions from it. A stochastic policy outputs a probability over actions; a deterministic policy always picks the same action in a given state. In game AI, when the opponent can observe or anticipate your move (e.g. poker, rock-paper-scissors), a deterministic policy is exploitable—the opponent will always know what you do. A stochastic policy keeps the opponent uncertain and is essential for mixed strategies. In recommendation, showing a deterministic “best” item every time can create filter bubbles; stochastic policies (or sampling from a distribution) encourage exploration and diversity. For robot navigation in partially observable or noisy settings, randomness can help escape local minima or handle uncertainty. ...