State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Shuang Wu, Jingyu Zhao, Guangjian Tian, Jun Wang

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 458-464. https://doi.org/10.24963/ijcai.2021/64

The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.
Keywords:
Agent-based and Multi-agent Systems: Multi-agent Planning
Agent-based and Multi-agent Systems: Resource Allocation
Planning and Scheduling: Planning and Scheduling
Planning and Scheduling: Markov Decisions Processes