Value Function Transfer for  Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

Yong Liu; Yujing Hu; Yang Gao; Yingfeng Chen; Changjie Fan

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

Yong Liu, Yujing Hu, Yang Gao, Yingfeng Chen, Changjie Fan

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

Main track. Pages 457-463. https://doi.org/10.24963/ijcai.2019/65

PDF BibTeX

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

Keywords:

Agent-based and Multi-agent Systems: Multi-agent Learning

Machine Learning: Reinforcement Learning

Machine Learning: Transfer, Adaptation, Multi-task Learning