Constrained Bayesian Reinforcement Learning via Approximate Linear Programming

Jongmin Lee; Youngsoo Jang; Pascal Poupart; Kee-Eung Kim

Constrained Bayesian Reinforcement Learning via Approximate Linear Programming

Jongmin Lee, Youngsoo Jang, Pascal Poupart, Kee-Eung Kim

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

Main track. Pages 2088-2095. https://doi.org/10.24963/ijcai.2017/290

PDF BibTeX

In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based Bayesian reinforcement learning (BRL) algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an off-line manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.

Keywords:

Machine Learning: Reinforcement Learning

Planning and Scheduling: POMDPs

Uncertainty in AI: Markov Decision Processes