Problem-dependent Regret for Lexicographic Multi-Armed Bandits with Adversarial Corruptions
Problem-dependent Regret for Lexicographic Multi-Armed Bandits with Adversarial Corruptions
Bo Xue, Xi Lin, Yuanyu Wan, Qingfu Zhang
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 6776-6784.
https://doi.org/10.24963/ijcai.2025/754
This paper studies lexicographic multi-armed bandits (MAB), where after selecting an arm, the agent observes a reward vector including multiple objectives, each with a different level of importance. Although previous literature has proposed the algorithm for lexicographic MAB, their algorithm suffers from several limitations: (1) it exhibits poor adversarial robustness due to its reliance on stochastic rewards, (2) its regret bound is suboptimal compared to single-objective counterparts, and (3) the regret bound does not adapt to specific problem instances. To address these limitations, we study lexicographic MAB with adversarial corruptions, where an adversary might corrupt the stochastic rewards with a corruption budget of C. First, when the value of C is known, we propose an algorithm achieving a problem-dependent regret bound of O(∑(log T / Δⁱ(a) + C)) for the i-th objective (i ∈ [M]), where Δⁱ(a) is the reward gap for arm a on the i-th objective, and M is the number of objectives. In the purely stochastic setting (C=0), this regret bound approaches optimality. Second, we introduce another algorithm that does not require value of C but incurs a less favorable regret bound of O(∑(γ_T / Δⁱ(a) + γ_T)) for the i-th objective, where γ_T = O((log T)² + KC(log T)²). Finally, we conduct experiments on both synthetic and real-world datasets to verify the effectiveness of our algorithms.
Keywords:
Machine Learning: ML: Online learning
Machine Learning: ML: Multi-armed bandits
Machine Learning: ML: Robustness
Uncertainty in AI: UAI: Sequential decision making
