Multi-Objective Neural Bandits with Random Scalarization

Ji Cheng; Bo Xue; Chengyu Lu; Ziqiang Cui; Qingfu Zhang

doi:10.24963/ijcai.2025/547

Multi-Objective Neural Bandits with Random Scalarization

Ji Cheng, Bo Xue, Chengyu Lu, Ziqiang Cui, Qingfu Zhang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 4914-4922. https://doi.org/10.24963/ijcai.2025/547

PDF BibTeX

Multi-objective multi-armed bandit (MOMAB) problems are crucial for complex decision-making scenarios where multiple conflicting objectives must be simultaneously optimized. However, most existing works are based on the linear assumption of the feedback rewards, which significantly constrains their applicability and efficacy in capturing the intricate dynamics of real-world environments. This paper explores a multi-objective neural bandit (MONB) framework, which integrates the universal approximators, neural networks, with the classical MOMABs. We adopt random scalarization to accommodate the special needs of a practitioner by setting an appropriate distribution on the regions of interest. Using the trade-off capabilities of upper confidence bound (UCB) and Thompson sampling (TS) strategies, we propose two novel algorithms, MONeural-UCB and MONeural-TS. Theoretical and empirical analysis demonstrate the superiority of our methods in multi-objective or multi-task bandit problems, which makes great improvement over the classical linear MOMABs.

Keywords:

Machine Learning: ML: Reinforcement learning

Machine Learning: ML: Multi-armed bandits