Potential Driven Reinforcement Learning for Hard Exploration Tasks

Enmin Zhao; Shihong Deng; Yifan Zang; Yongxin Kang; Kai Li; Junliang Xing

doi:10.24963/ijcai.2020/290

Potential Driven Reinforcement Learning for Hard Exploration Tasks

Enmin Zhao, Shihong Deng, Yifan Zang, Yongxin Kang, Kai Li, Junliang Xing

Short video

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 2096-2102. https://doi.org/10.24963/ijcai.2020/290

PDF BibTeX

Experience replay plays a crucial role in Reinforcement Learning (RL), enabling the agent to remember and reuse experience from the past. Most previous methods sample experience transitions using simple heuristics like uniformly sampling or prioritizing those good ones. Since humans can learn from both good and bad experiences, more sophisticated experience replay algorithms need to be developed. Inspired by the potential energy in physics, this work introduces the artificial potential field into experience replay and develops Potentialized Experience Replay (PotER) as a new and effective sampling algorithm for RL in hard exploration tasks with sparse rewards. PotER defines a potential energy function for each state in experience replay and helps the agent to learn from both good and bad experiences using intrinsic state supervision. PotER can be combined with different RL algorithms as well as the self-imitation learning algorithm. Experimental analyses and comparisons on multiple challenging hard exploration environments have verified its effectiveness and efficiency.

Keywords:

Machine Learning: Reinforcement Learning

Machine Learning: Deep Reinforcement Learning

Heuristic Search and Game Playing: Combinatorial Search and Optimisation