Hierarchical Transformer for Scalable Graph Learning
Hierarchical Transformer for Scalable Graph Learning
Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 4702-4710.
https://doi.org/10.24963/ijcai.2023/523
Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.
Keywords:
Machine Learning: ML: Sequence and graph learning