Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

Pei Xu; Junge Zhang; Kaiqi Huang

doi:10.24963/ijcai.2023/37

Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

Pei Xu, Junge Zhang, Kaiqi Huang

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Main Track. Pages 326-334. https://doi.org/10.24963/ijcai.2023/37

PDF BibTeX

Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Previous works argue that complex dynamics between agents and the huge exploration space in MARL scenarios amplify the vulnerability of classical count-based exploration methods when combined with agents parameterized by neural networks, resulting in inefficient exploration. In this paper, we show that introducing constrained joint policy diversity into a classical count-based method can significantly improve exploration when agents are parameterized by neural networks. Specifically, we propose a joint policy diversity to measure the difference between current joint policy and previous joint policies, and then use a filtering-based exploration constraint to further refine the joint policy diversity. Under the sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks. To the best of our knowledge, on the hard 3s_vs_5z task which needs non-trivial strategies to defeat enemies, our method is the first to learn winning strategies without domain knowledge under the sparse-reward setting.

Keywords:

Agent-based and Multi-agent Systems: MAS: Multi-agent learning

Machine Learning: ML: Deep reinforcement learning