Seeking Proxy Point via Stable Feature Space for Noisy Correspondence Learning

Yucheng Xie; Songyue Cai; Tao Tong; Ping Hu; Xiaofeng Zhu

doi:10.24963/ijcai.2025/231

Seeking Proxy Point via Stable Feature Space for Noisy Correspondence Learning

Yucheng Xie, Songyue Cai, Tao Tong, Ping Hu, Xiaofeng Zhu

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 2072-2080. https://doi.org/10.24963/ijcai.2025/231

PDF BibTeX

To meet the growing demand for cross-modal training data, directly collecting multimodal data from the Internet has become prevalent. However, such data inevitably suffer from Noisy Correspondence. Previous works focused on recasting soft labels to mitigate noise's negative impact. We explore a novel perspective to solve this problem: pursuing proxy representation for noisy data to enable reliable feature learning. To this end, we propose a novel framework: Seeking Proxy Point via Stable Feature Space (SPS). This framework employs a fine-grained partitioning strategy to obtain a high-confidence reliable set. By imposing intermodal cross-transformation consistency constraints and intramodal metric consistency constraints, a stable feature space is constructed. Building on this foundation, SPS seeks proxy points for noisy data, enabling even noisy data to be accurately embedded into appropriate positions within the feature space. Combined with partial alignment for partially matched data pairs, SPS ultimately achieves robust learning under Noisy Correspondence. Experiments on three widely used cross-modal datasets demonstrate that SPS significantly outperforms previous methods. Our code is available at https://github.com/C-TeaRanger/SPS.

Keywords:

Computer Vision: CV: Multimodal learning

Data Mining: DM: Information retrieval

Machine Learning: ML: Multi-modal learning

Machine Learning: ML: Robustness