Reinforcement Learning with Option Machines

Reinforcement Learning with Option Machines

Floris den Hengst, Vincent Francois-Lavet, Mark Hoogendoorn, Frank van Harmelen

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 2909-2915. https://doi.org/10.24963/ijcai.2022/403

Reinforcement learning (RL) is a powerful framework for learning complex behaviors, but lacks adoption in many settings due to sample size requirements. We introduce a framework for increasing sample efficiency of RL algorithms. Our approach focuses on optimizing environment rewards with high-level instructions. These are modeled as a high-level controller over temporally extended actions known as options. These options can be looped, interleaved and partially ordered with a rich language for high-level instructions. Crucially, the instructions may be underspecified in the sense that following them does not guarantee high reward in the environment. We present an algorithm for control with these so-called option machines (OMs), discuss option selection for the partially ordered case and describe an algorithm for learning with OMs. We compare our approach in zero-shot, single- and multi-task settings in an environment with fully specified and underspecified instructions. We find that OMs perform significantly better than or comparable to the state-of-art in all environments and learning settings.
Keywords:
Machine Learning: Reinforcement Learning
Machine Learning: Deep Reinforcement Learning
Planning and Scheduling: Planning with Incomplete Information