MARL

Overall Progress 0%

Model Rock-Paper-Scissors as Dec-POMDP.

Nash equilibrium of 2×2 matrix; independent learning outcome.

Explain CTDE with example; why it helps non-stationarity.

Agents output message + action; train for coordination task.