Volume 9: Multi-Agent RL (MARL)

Chapters 81–90 — Multi-agent fundamentals, game theory, IQL, CTDE, MADDPG, VDN, QMIX, MAPPO, self-play, communication.

Overall Progress 0%

Model Rock-Paper-Scissors as Dec-POMDP.

Go to Chapter 81: Multi-Agent Fundamentals →

Nash equilibrium of 2×2 matrix; independent learning outcome.

Go to Chapter 82: Game Theory Basics for RL →

IQL in cooperative meet-up game; non-stationarity.

Go to Chapter 83: Independent Q-Learning (IQL) →

Explain CTDE with example; why it helps non-stationarity.

Go to Chapter 84: Centralized Training, Decentralized Execution (CTDE) →

MADDPG on simple spread; centralized critics, decentralized actors.

Go to Chapter 85: Multi-Agent DDPG (MADDPG) →

VDN: sum individual Q to joint Q; compare with IQL.

Go to Chapter 86: Value Decomposition Networks (VDN) →

QMIX: mixing network, monotonicity via hypernetworks.

Go to Chapter 87: QMIX Algorithm →

MAPPO with parameter sharing; centralized value; compare with IPPO.

Go to Chapter 88: Multi-Agent PPO (MAPPO) →

Self-play in Tic-Tac-Toe; track ELO.

Go to Chapter 89: Self-Play and League Training →

Agents output message + action; train for coordination task.

Go to Chapter 90: Communication in MARL →

Review Volume 9 (Multi-Agent RL, game theory, QMIX, MAPPO) and preview Volume 10 (Real-World RL — safety, alignment, LLMs, deployment).

Go to Volume 9 Review & Bridge to Volume 10 →

Multi-agent fundamentals, game theory for RL, independent Q-learning, centralized training with decentralized execution, MADDPG, VDN, QMIX, multi-agent PPO, self-play and league training, and communication in MARL. Chapters 81–90.