ECG2TOK: ECG Pre-Training with Self-Distillation Semantic Tokenizers
ECG2TOK: ECG Pre-Training with Self-Distillation Semantic Tokenizers
Xiaoyan Yuan, Wei Wang, Han Liu, Jian Chen, Xiping Hu
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
AI and Social Good. Pages 9990-9998.
https://doi.org/10.24963/ijcai.2025/1110
Self-supervised learning (SSL) has garnered increasing attention in electrocardiogram (ECG) analysis for its effectiveness in resource-limited settings. Existing state-of-the-art SSL methods rely on time-frequency detail reconstruction, but due to the inherent redundancy of ECG signals and individual variability, these approaches often yield suboptimal performance. In contrast, discrete label prediction becomes a superior pre-training objective by encouraging models to efficiently abstract ECG high-level semantics. However, the continuity and significant variability of ECG signals pose a challenge in generating semantically discrete labels. To address this issue, we propose an ECG pretraining framework with a self-distillation semantic tokenizer (ECG2TOK), which maps continuous ECG signals into discrete labels for self-supervised training. Specifically, the tokenizer extracts semantically aware embeddings of ECG by self-distillation and performs online clustering to generate semantically rich discrete labels. Subsequently, the SSL model is trained in conjunction with masking strategies and discrete label prediction to facilitate the abstraction of high-level semantic representations. We evaluate ECG2TOK in six downstream tasks, demonstrating that ECG2TOK efficiently achieves state-of-the-art performance and up to a 30.73% AUC increase in low-resource scenarios. Moreover, visualization experiments demonstrate that the discrete labels generated by ECG2TOK exhibit consistent semantics closely associated with clinical features. Our code is available on https://github.com/YXYanova/ECG2TOK.
Keywords:
Data Mining: General
Humans and AI: General
Knowledge Representation and Reasoning: General
Machine Learning: General
