Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning

Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning

Mingzhe Xing, Hangyu Mao, Zhen Xiao

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 564-570. https://doi.org/10.24963/ijcai.2022/80

On computing clusters, the autoscaler is responsible for allocating resources for jobs or fine-grained tasks to ensure their Quality of Service. Due to a more precise resource management, fine-grained autoscaling can generally achieve better performance. However, the fine-grained autoscaling for streaming jobs needs intensive computation to model the complicated running states of tasks, and has not been adequately studied previously. In this paper, we propose a novel fine-grained autoscaler for streaming jobs based on reinforcement learning. We first organize the running states of streaming jobs as spatio-temporal graphs. To efficiently make autoscaling decisions, we propose a Neural Variational Subgraph Sampler to sample spatio-temporal subgraphs. Furthermore, we propose a mutual-information-based objective function to explicitly guide the sampler to extract more representative subgraphs. After that, the autoscaler makes decisions based on the learned subgraph representations. Experiments conducted on real-world datasets demonstrate the superiority of our method over six competitive baselines.
Keywords:
Agent-based and Multi-agent Systems: Resource Allocation
Data Mining: Mining Spatial and/or Temporal Data
Data Mining: Parallel, Distributed and Cloud-based High Performance Mining
Machine Learning: Deep Reinforcement Learning
Planning and Scheduling: Scheduling