Explicitly Coordinated Policy Iteration

Explicitly Coordinated Policy Iteration

Yujing Hu, Yingfeng Chen, Changjie Fan, Jianye Hao

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 357-363. https://doi.org/10.24963/ijcai.2019/51

Coordination on an optimal policy between independent learners in fully cooperative stochastic games is difficult due to problems such as relative overgeneralization and miscoordination. Most state-of-the-art algorithms apply fusion heuristics on agents' optimistic and average rewards, by which coordination between agents can be achieved implicitly. However, such implicit coordination faces practical issues such as tedious parameter-tuning in real world applications. The lack of an explicit coordination mechanism may also lead to a low likelihood of coordination in problems with multiple optimal policies. Based on the necessary conditions of an optimal policy, we propose the explicitly coordinated policy iteration (EXCEL) algorithm which always forces agents to coordinate by comparing the agents' separated optimistic and average value functions. We also propose three solutions for deep reinforcement learning extensions of EXCEL. Extensive experiments in matrix games (from 2-agent 2-action games to 5-agent 20-action games) and stochastic games (from 2-agent games to 5-agent games) show that EXCEL has better performance than the state-of-the-art algorithms (such as faster convergence and better coordination).
Keywords:
Agent-based and Multi-agent Systems: Multi-agent Learning
Machine Learning: Reinforcement Learning
Agent-based and Multi-agent Systems: Coordination and Cooperation
Agent-based and Multi-agent Systems: Cooperative Games