SSML-QNet: Scale-Separative Metric Learning Quadruplet Network for Multi-modal Image Patch Matching

SSML-QNet: Scale-Separative Metric Learning Quadruplet Network for Multi-modal Image Patch Matching

Xiuwei Zhang, Yi Sun, Yamin Han, Yanping Li, Hanlin Yin, Yinghui Xing, Yanning Zhang

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 4593-4601. https://doi.org/10.24963/ijcai.2023/511

Multi-modal image matching is very challenging due to the significant diversities in visual appearance of different modal images. Typically, the existing well-performed methods mainly focus on learning invariant and discriminative features for measuring the relation between multi-modal image pairs. However, these methods often take the features as a whole and largely overlook the fact that different scale features for a same image pair may have different similarity, which may lead to sub-optimal results only. In this work, we propose a Scale-Separative Metric Learning Quadruplet network (SSML-QNet) for multi-modal image patch matching. Specifically, SSML-QNet can extract both relevant and irrelevant features of imaging modality with the proposed quadruplet network architecture. Then, the proposed Scale-Separative Metric Learning module separately encodes the similarity of different scale features with the pyramid structure. And for each scale, cross-modal consistent features are extracted and measured by coordinate and channel-wise attention sequentially. This makes our network robust to appearance divergence caused by different imaging mechanism. Experiments on the benchmark dataset (VIS-NIR, VIS-LWIR, Optical-SAR, and Brown) have verified that the proposed SSML-QNet is able to outperform other state-of-the-art methods. Furthermore, the cross-dataset transferring experiments on these four datasets also have shown that the proposed method has powerful ability of cross-dataset transferring.
Keywords:
Machine Learning: ML: Classification
Computer Vision: CV: Machine learning for vision
Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning