Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

Bo Xue, Guanghui Wang, Yimu Wang, Lijun Zhang

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 2936-2942. https://doi.org/10.24963/ijcai.2020/406

In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite 1+epsilon moments for some epsilon in (0,1]. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of widetilde{O}(d^(1/2)T^(1/(1+epsilon))), where d is the dimension of contextual information and T is the time horizon. Meanwhile, we provide an Omega(d^(epsilon/(1+epsilon))T^(1/(1+epsilon))) lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of d and T when epsilon=1. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.
Keywords:
Machine Learning: Online Learning
Machine Learning: Probabilistic Machine Learning