Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms

Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms

Pratik Gajane, Peter Auer, Ronald Ortner

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 3714-3722. https://doi.org/10.24963/ijcai.2023/413

We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black-box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.
Keywords:
Machine Learning: ML: Reinforcement learning
Machine Learning: ML: Learning theory