I²HRL: Interactive Influence-based Hierarchical Reinforcement Learning

Rundong Wang; Runsheng Yu; Bo An; Zinovi Rabinovich

doi:10.24963/ijcai.2020/433

I²HRL: Interactive Influence-based Hierarchical Reinforcement Learning

Rundong Wang, Runsheng Yu, Bo An, Zinovi Rabinovich

Short video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 3131-3138. https://doi.org/10.24963/ijcai.2020/433

PDF BibTeX

Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HRL method: Interactive Influence-based Hierarchical Reinforcement Learning (I^2HRL). First, inspired by agent modeling, we enable the interaction between the low-level and high-level policies to stabilize the high-level policy training. The high-level policy makes decisions conditioned on the received low-level policy representation as well as the state of the environment. Second, we furthermore stabilize the high-level policy via an information-theoretic regularization with minimal dependence on the changing low-level policy. Third, we propose the influence-based exploration to more frequently visit the non-stationary states where more transition data is needed. We experimentally validate the effectiveness of the proposed solution in several tasks in MuJoCo domains by demonstrating that our approach can significantly boost the learning performance and accelerate learning compared with state-of-the-art HRL methods.

Keywords:

Machine Learning: Deep Reinforcement Learning