Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks

Wonchul Shin; Yusung Kim

doi:10.24963/ijcai.2023/469

Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks

Wonchul Shin, Yusung Kim

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Main Track. Pages 4217-4225. https://doi.org/10.24963/ijcai.2023/469

PDF BibTeX

Reinforcement learning (RL) has achieved considerable success in many fields, but applying it to real-world problems can be costly and risky because it requires a lot of online interaction. Recently, offline RL has shown the possibility of extracting a solution through existing logged data without online interaction. In this work, we propose an offline hierarchical RL method, Guider (Guide to Control), that can efficiently solve long-horizon and sparse-reward tasks from offline data. The high-level policy sequentially generates a subgoal that can guide the agent to arrive at the final goal, and the lower-level policy learns how to reach each given guided subgoal. In the process of learning from offline data, the key is to make the low-level policy reachable to the generated subgoals. We show that high-quality subgoal generation is possible through pre-training a latent subgoal prior model. The well-regulated subgoal generation improves performance while avoiding distributional shifts in offline RL by breaking down long, complex tasks into shorter, easier ones. For evaluations, Guider outperforms prior offline RL methods in long-horizon robot navigation and complex manipulation benchmarks. Our code is available at https://github.com/gckor/Guider.

Keywords:

Machine Learning: ML: Deep reinforcement learning

Planning and Scheduling: PS: Learning in planning and scheduling

Robotics: ROB: Behavior and control