Chapter 54: Monte Carlo Tree Search (MCTS)
Learning objectives Implement MCTS for a small game (e.g. tic-tac-toe): selection (UCT), expansion, simulation (rollout), backpropagation. Use UCT (Upper Confidence bound for Trees) for node selection: \(\frac{Q(s,a)}{N(s,a)} + c \sqrt{\frac{\log N(s)}{N(s,a)}}\). Evaluate win rate against a random opponent. Concept and real-world RL MCTS builds a search tree by repeatedly selecting a leaf (UCT), expanding it, doing a random rollout to the end, and backpropagating the result. It does not require a learned value function (though it can use one, as in AlphaZero). In game AI (chess, Go, tic-tac-toe), MCTS is used for planning and action selection; it balances exploration (trying undervisited moves) and exploitation (favoring good moves). ...