Dynamic Belief for Decentralized Multi-Agent Cooperative Learning

Yunpeng Zhai; Peixi Peng; Chen Su; Yonghong Tian

doi:10.24963/ijcai.2023/39

Dynamic Belief for Decentralized Multi-Agent Cooperative Learning

Yunpeng Zhai, Peixi Peng, Chen Su, Yonghong Tian

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Main Track. Pages 344-352. https://doi.org/10.24963/ijcai.2023/39

PDF BibTeX

Decentralized multi-agent cooperative learning is a practical task due to the partially observed setting both in training and execution. Every agent learns to cooperate without access to the observations and policies of others. However, the decentralized training of multi-agent is of great difficulty due to non-stationarity, especially when other agents' policies are also in learning during training. To overcome this, we propose to learn a dynamic policy belief for each agent to predict the current policies of other agents and accordingly condition the policy of its own. To quickly adapt to the development of others' policies, we introduce a historical context to learn the belief inference according to a few recent action histories of other agents and a latent variational inference to model their policies by a learned distribution. We evaluate our method on the StarCraft II micro management task (SMAC) and demonstrate its superior performance in the decentralized training settings and comparable results with the state-of-the-art CTDE methods.

Keywords:

Agent-based and Multi-agent Systems: MAS: Multi-agent learning

Agent-based and Multi-agent Systems: MAS: Coordination and cooperation