Multi-agent fundamentals, game theory for RL, independent Q-learning, centralized training with decentralized execution, MADDPG, VDN, QMIX, multi-agent PPO, self-play and league training, and communication in MARL. Chapters 81–90.
Volume 9: Multi-Agent RL (MARL)
Chapters 81–90 — Multi-agent fundamentals, game theory, IQL, CTDE, MADDPG, VDN, QMIX, MAPPO, self-play, communication.
Model Rock-Paper-Scissors as Dec-POMDP.
Nash equilibrium of 2×2 matrix; independent learning outcome.
IQL in cooperative meet-up game; non-stationarity.
Explain CTDE with example; why it helps non-stationarity.
MADDPG on simple spread; centralized critics, decentralized actors.
VDN: sum individual Q to joint Q; compare with IQL.
QMIX: mixing network, monotonicity via hypernetworks.
MAPPO with parameter sharing; centralized value; compare with IPPO.
Self-play in Tic-Tac-Toe; track ELO.
Agents output message + action; train for coordination task.
Review Volume 9 (Multi-Agent RL, game theory, QMIX, MAPPO) and preview Volume 10 (Real-World RL — safety, alignment, LLMs, deployment).