SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting

SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting

Yiduo Li, Shiyi Qi, Zhe Li, Zhongwen Rao, Lujia Pan, Zenglin Xu

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 2169-2177. https://doi.org/10.24963/ijcai.2023/241

The success of Transformers in long time series forecasting (LTSF) can be attributed to their attention mechanisms and non-autoregressive (NAR) decoder structures, which capture long-range de- pendencies. However, time series data also contain abundant local temporal dependencies, which are often overlooked in the literature and significantly hinder forecasting performance. To address this issue, we introduce SMARTformer, which stands for SeMi-AutoRegressive Transformer. SMARTformer utilizes the Integrated Window Attention (IWA) and Semi-AutoRegressive (SAR) Decoder to capture global and local dependencies from both encoder and decoder perspectives. IWA conducts local self-attention in multi-scale windows and global attention across windows with linear com- plexity to achieve complementary clues in local and enlarged receptive fields. SAR generates subsequences iteratively, similar to autoregressive (AR) decoding, but refines the entire sequence in a NAR manner. This way, SAR benefits from both the global horizon of NAR and the local detail capturing of AR. We also introduce the Time-Independent Embedding (TIE), which better captures local dependencies by avoiding entanglements of various periods that can occur when directly adding po- sitional embedding to value embedding. Our ex- tensive experiments on five benchmark datasets demonstrate the effectiveness of SMARTformer against state-of-the-art models, achieving an improvement of 10.2% and 18.4% in multivariate and univariate long-term forecasting, respectively.
Keywords:
Data Mining: DM: Mining spatial and/or temporal data
Machine Learning: ML: Regression
Machine Learning: ML: Time series and data streams
Data Mining: DM: Applications