Chapter 83: Independent Q-Learning (IQL)
Learning objectives Implement independent Q-learning (IQL) in a simple cooperative game (e.g. two agents must “meet” in the same cell or coordinate to achieve a joint goal). Observe the non-stationarity problem: as one agent’s policy changes, the transition and reward from the other agent’s perspective change, so the environment appears non-stationary. Explain why IQL can still work in some cooperative settings despite non-stationarity, and when it fails or converges slowly. Compare IQL with a baseline (e.g. random or hand-coded coordination) on the meet-up or similar task. Relate IQL and non-stationarity to game AI (teammates) and dialogue (multiple agents). Concept and real-world RL ...