On Thompson Sampling and Asymptotic Optimality

Jan Leike; Tor Lattimore; Laurent Orseau; Marcus Hutter

On Thompson Sampling and Asymptotic Optimality

Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

Best Sister Conferences. Pages 4889-4893. https://doi.org/10.24963/ijcai.2017/688

PDF BibTeX

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

Keywords:

Artificial Intelligence: machine learning

Artificial Intelligence: uncertainty in artificial intelligence

Artificial Intelligence: artificial intelligence