Chapter 84: Centralized Training, Decentralized Execution (CTDE)
Learning objectives Explain the CTDE paradigm: during training, algorithms can use centralized information (e.g. global state, all agents’ actions) to learn better value functions or gradients; during execution, each agent uses only its local observation and policy (decentralized). Give a concrete example (e.g. QMIX, MADDPG, or a simple cooperative task) where the critic or value function uses global state and the actor uses only local observation. Explain why CTDE helps with non-stationarity: during training, the centralized critic sees the full state and other agents’ actions, so the environment from the critic’s perspective is “stationary” (we know the joint action); each agent’s policy can then be trained with this stable learning signal. Identify why decentralized execution is important for scalability and deployment (no need to communicate all observations at test time). Relate CTDE to game AI (team coordination) and robot navigation (multi-robot systems). Concept and real-world RL ...