Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos

Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos

Jiawei Liu, Zheng-Jun Zha, Xierong Zhu, Na Jiang

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 1012-1018. https://doi.org/10.24963/ijcai.2020/141

Person re-identification aims at identifying a certain pedestrian across non-overlapping camera networks. Video-based person re-identification approaches have gained significant attention recently, expanding image-based approaches by learning features from multiple frames. In this work, we propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos. It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions, towards learning discriminative pedestrian representation. Specifically, multiple co-saliency learning modules within CSTNet are designed to utilize the correlated information across video frames to extract the salient features from the task-relevant regions and suppress background interference. Moreover, multiple spatial-temporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation, to enhance feature representation. Extensive experiments on two benchmarks have demonstrated the effectiveness of the proposed method.
Keywords:
Computer Vision: Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation
Computer Vision: Video: Events, Activities and Surveillance