Chapter 81: Multi-Agent Fundamentals

Learning objectives Model a two-player zero-sum game (e.g. Rock-Paper-Scissors) as a Dec-POMDP (Decentralized Partially Observable MDP) or equivalent multi-agent framework. Define states, observations, actions, and rewards for each agent in the game. Explain the difference between centralized (one controller sees everything) and decentralized (each agent has its own observation and policy) formulations. Identify how the same game can be viewed as a normal-form game (payoff matrix) and as a sequential Dec-POMDP (if we add structure). Relate multi-agent modeling to game AI (opponents, teammates) and trading (multiple market participants). Concept and real-world RL ...

March 10, 2026 · 4 min · 673 words · codefrydev

Chapter 82: Game Theory Basics for RL

Learning objectives Compute the Nash equilibrium of a simple 2×2 game (e.g. Prisoner’s Dilemma) from the payoff matrix. Explain why independent learning (each agent learns its best response without knowing the other’s policy) might converge to an outcome that is not a Nash equilibrium, or might not converge at all. Compare Nash equilibrium payoffs with the payoffs that result from independent Q-learning or gradient-based learning in the same game. Identify the difference between cooperative, competitive, and mixed settings in terms of payoff structure. Relate game theory to game AI (opponent modeling) and trading (market equilibrium). Concept and real-world RL ...

March 10, 2026 · 4 min · 672 words · codefrydev