Chapter 86: Value Decomposition Networks (VDN)
Learning objectives Implement VDN: for a cooperative game, define joint Q as the sum of individual Q-values: Q_tot(s, a_1,…,a_n) = Q_1(o_1, a_1) + … + Q_n(o_n, a_n). Train with a joint reward (e.g. team reward): use TD on Q_tot so that the sum of individual Qs approximates the joint return; backprop distributes the gradient to each Q_i. Compare VDN with IQL (each agent trains Q_i on local reward or team reward without factorization) in terms of learning speed and final return. Explain the limitation of VDN: additivity may not hold for all tasks (e.g. when there are strong synergies or redundancies between agents). Relate VDN to game AI (team games) and robot navigation (multi-robot coordination). Concept and real-world RL ...