Dynamic Bandits with Temporal Structure

Dynamic Bandits with Temporal Structure

Qinyi Chen

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Doctoral Consortium. Pages 5841-5842. https://doi.org/10.24963/ijcai.2022/823

In this work, we study a dynamic multi-armed bandit (MAB) problem, where the expected reward of each arm evolves over time following an auto-regressive model. We present an algorithm whose per-round regret upper bound almost matches the regret lower bound, and numerically demonstrate its efficacy in adapting to the changing environment.
Keywords:
Machine Learning (ML): General