Categorical Attention: Fine-grained Language-guided Noise Filtering Network for Occluded Person Re-Identification

Categorical Attention: Fine-grained Language-guided Noise Filtering Network for Occluded Person Re-Identification

Minghui Chen, Dayan Wu, Chenxu Yang, Qinghang Su, Zheng Lin

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 801-809. https://doi.org/10.24963/ijcai.2025/90

Person Re-Identification (ReID) aims to match individuals across different camera views, but occlusions in real-world scenarios, such as vehicles or crowds, hinder feature extraction and matching. Current occluded ReID methodologies typically leverage visual augmentation techniques in an attempt to mitigate the disruptive effects of occlusion-induced noise. However, relying solely on visual data fail to effectively filter out occlusion noise. In this paper, we introduce the Fine-grained Language-guided Noise Filtering Network (FLaN-Net) for occluded ReID. FLaN-Net innovatively employs categorical attention mechanism to generate adaptive tokens that capture the following three distinct types of visual information: comprehensive descriptions of individuals, detailed visible attributes, and characteristics of occluding objects. Subsequently, a cross-attention mechanism aligns these prompts with the image, guiding the model to focus on relevant regions. To generate robust and discriminative features for occluded pedestrians, we further introduce a dynamic weighting fusion module that integrates visual, textual, and cross-attention features based on their reliability. Experimental results demonstrate that FLaN-Net outperforms existing methods on occluded ReID benchmarks, offering a robust solution for challenging real-world conditions.
Keywords:
Computer Vision: CV: Image and video retrieval 
Computer Vision: CV: Multimodal learning