Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

Huanle Xu; Yang Liu; Wing Cheong Lau; Rui Li

doi:10.24963/ijcai.2020/354

Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

Huanle Xu, Yang Liu, Wing Cheong Lau, Rui Li

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 2554-2560. https://doi.org/10.24963/ijcai.2020/354

PDF BibTeX

The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.

Keywords:

Machine Learning: Online Learning

Uncertainty in AI: Other