Chapter 87: QMIX Algorithm
Learning objectives Implement QMIX: a mixing network that takes agent Q-values (Q_1,…,Q_n) and the global state s and outputs joint Q_tot, with monotonicity constraint ∂Q_tot/∂Q_i ≥ 0 so that argmax over joint action decomposes to per-agent argmax. Enforce monotonicity by generating mixing weights with hypernetworks that take s and output positive weights (e.g. absolute value of network outputs). Train with TD on Q_tot using the joint reward; backprop through the mixing network to update both mix weights and individual Q_i. Test on a cooperative task and compare with VDN and IQL. Relate QMIX to game AI (StarCraft, team coordination) and robot navigation (multi-robot). Concept and real-world RL ...