Constrained Bayesian Reinforcement Learning via Approximate Linear Programming

Constrained Bayesian Reinforcement Learning via Approximate Linear Programming

Jongmin Lee, Youngsoo Jang, Pascal Poupart, Kee-Eung Kim

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 2088-2095. https://doi.org/10.24963/ijcai.2017/290

In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based Bayesian reinforcement learning (BRL) algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an off-line manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.
Keywords:
Machine Learning: Reinforcement Learning
Planning and Scheduling: POMDPs
Uncertainty in AI: Markov Decision Processes