Multi-Agent

Model Rock-Paper-Scissors as Dec-POMDP.

Nash equilibrium of 2×2 matrix; independent learning outcome.

IQL in cooperative meet-up game; non-stationarity.

MADDPG on simple spread; centralized critics, decentralized actors.

VDN: sum individual Q to joint Q; compare with IQL.

QMIX: mixing network, monotonicity via hypernetworks.

Review Volume 8 (Offline RL, Imitation Learning, IRL, RLHF) and preview Volume 9 (Multi-Agent RL — cooperation, competition, game theory).

Review Volume 9 (Multi-Agent RL, game theory, QMIX, MAPPO) and preview Volume 10 (Real-World RL — safety, alignment, LLMs, deployment).