SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines
SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines
Meng Wan, Rongqiang Cao, Yanghao Li, Jue Wang, Zijian Wang, Qi Su, Lei Qiu, Peng Shi, Yangang Wang, Chong Li
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 3326-3334.
https://doi.org/10.24963/ijcai.2025/370
Deep-learning-based lossless compression is of immense importance in real-world applications, such as cold data persistence, sensor data collection, and astronomical data transmission. However, existing compressors typically model data using single-byte symbols as tokens, which makes it hard to capture the inherent correlations and cannot effectively utilize the parallel capabilities of GPU and multi-core CPU. This paper proposes SEP, a novel lossless compression framework for most time-series backbone neural networks. We first introduce a semantic enhancement module to capture the complex intra-patch relationships of binary byte streams. To improve the compression speed, we design multi-stream pipelines that dynamically assign parallel tasks to GPU streams and multi-cores. We further propose a novel GPU memory optimization strategy, which reuses GPU memory by a shared pool across streams. We conduct experiments on seven real-world datasets and the results demonstrate that our SEP framework outperforms state-of-the-art compressors with an average speed improvement of 30.0% and an average compression ratio gain of 5.1%, which is further elevated to 7.6% with the use of pre-training models. The GPU memory footprint is reduced by as high as 63.1% and by an average of 36.2%. The source code is available at: https://github.com/damonwan1/SEP.
Keywords:
Data Mining: DM: Applications
Machine Learning: ML: Applications
