Multi-Agent
Overall Progress
0%
Model Rock-Paper-Scissors as Dec-POMDP.
Nash equilibrium of 2×2 matrix; independent learning outcome.
IQL in cooperative meet-up game; non-stationarity.
MADDPG on simple spread; centralized critics, decentralized actors.
VDN: sum individual Q to joint Q; compare with IQL.
QMIX: mixing network, monotonicity via hypernetworks.
Review Volume 8 (Offline RL, Imitation Learning, IRL, RLHF) and preview Volume 9 (Multi-Agent RL — cooperation, competition, game theory).
Review Volume 9 (Multi-Agent RL, game theory, QMIX, MAPPO) and preview Volume 10 (Real-World RL — safety, alignment, LLMs, deployment).