Chapter 85: Multi-Agent DDPG (MADDPG)

Learning objectives Implement MADDPG for the Multi-Agent Particle Environment (e.g. “simple spread”): each agent has a decentralized actor (policy π_i(o_i) or π_i(s_i)) and a centralized critic Q_i(s, a_1,…,a_n) that takes the full state and all actions. Train the critics with TD targets using (s, a_1,…,a_n) and the actors with the gradient of Q_i w.r.t. agent i’s action (DDPG-style). Explain why centralized critics help: each Q_i can use the full state and joint action, so the critic sees a stationary environment; the actor for agent i is updated to maximize Q_i(s, a_1,…,a_i,…,a_n) by changing a_i (with a_i = π_i(o_i) at execution). Run on “simple spread” (or similar) and report coordination behavior and return. Relate MADDPG to robot navigation (multi-robot) and game AI (cooperative or competitive). Concept and real-world RL ...

March 10, 2026 · 4 min · 652 words · codefrydev