Vision Shared and Representation Isolated Network for Person Search

Vision Shared and Representation Isolated Network for Person Search

Yang Liu, Yingping Li, Chengyu Kong, Yuqiu Kong, Shenglan Liu, Feilong Wang

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 1216-1222. https://doi.org/10.24963/ijcai.2022/170

Person search is a widely-concerned computer vision task that aims to jointly solve the problems of pedestrian detection and person re-identification in panoramic scenes. However, the pedestrian detection focuses on the consistency of pedestrians, while the person re-identification attempts to extract the discriminative features of pedestrians. The inevitable conflict greatly restricts the researches on the one-stage person search methods. To address this issue, we propose a Vision Shared and Representation Isolated (VSRI) network to decouple the two conflicted subtasks simultaneously, through which two independent representations are constructed for the two subtasks. To enhance the discrimination of the re-ID representation, a Multi-Level Feature Fusion (MLFF) module is proposed. The MLFF adopts the Spatial Pyramid Feature Fusion (SPFF) module to obtain diverse features from the stem network. Moreover, the multi-head self-attention mechanism is employed to construct a Multi-head Attention Driven Extraction (MADE) module and the cascaded convolution unit is adopted to devise a Feature Decomposition and Cascaded Integration (FDCI) module, which facilitates the MLFF to obtain more discriminative representations of the pedestrians. The proposed method outperforms the state-of-the-art methods on the mainstream datasets.
Keywords:
Computer Vision: Image and Video retrieval 
Computer Vision: Applications
Computer Vision: Recognition (object detection, categorization)