Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

Huanle Xu, Yang Liu, Wing Cheong Lau, Rui Li

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 2554-2560. https://doi.org/10.24963/ijcai.2020/354

The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.
Keywords:
Machine Learning: Online Learning
Uncertainty in AI: Other