Wave-wise Discriminative Tracking by Phase-Amplitude Separation, Augmentation and Mixture

Wave-wise Discriminative Tracking by Phase-Amplitude Separation, Augmentation and Mixture

Huibin Tan, Mingyu Cao, Kun Hu, Xihuai He, Zhe Wang, Hao Li, Long Lan, Mengzhu Wang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 1927-1935. https://doi.org/10.24963/ijcai.2025/215

Distinguishing key features in complex visual tasks is challenging. A novel approach treats image patches (tokens) as waves. By using both phase and amplitude, it captures richer semantics and specific invariances compared to pixel-based methods, and allows for feature fusion across regions for a holistic image representation. Based on this, we propose the Wave-wise Discriminative Transformer Tracker (WDT). During tracking, WDT represents features via phase-amplitude separation, enhancement, and mixture. First, we designed a Mutual Exclusive Phase-Amplitude Extractor (MEPAE) to separate phase and amplitude features with distinct semantics, representing spatial target info and background brightness respectively. Then, Wave-wise Feature Augmentation is carried out with two submodules: Phase-Amplitude Feature Augmentation and Mixture. The augmentation module disrupts the separated features in the same batch, and the mixture module recombines them to generate positive and negative waves. The original features are aggregated into the original wave. Positive waves have the same phase but different amplitudes, and negative waves have different phase components. Finally, self-supervised and tracking-supervised losses guide the global and local representation learning for original, positive, and negative waves, enhancing wave-level discrimination. Experiments on five benchmarks prove the effectiveness of our method.
Keywords:
Computer Vision: CV: Representation learning
Computer Vision: CV: Machine learning for vision
Computer Vision: CV: Motion and tracking