Bi-DiffCD: Bidirectional Diffusion Guided Collaborative Change Detection for Arbitrary-Modal Remote Sensing Images

Bi-DiffCD: Bidirectional Diffusion Guided Collaborative Change Detection for Arbitrary-Modal Remote Sensing Images

Jingyu Zhao, Jiahui Qu, Wenqian Dong

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 2449-2457. https://doi.org/10.24963/ijcai.2025/273

Change detection aims to identify land cover changes by analyzing multitemporal images that cover the same area. However, It may be difficult to effectively obtain high-quality multitemporal images with the same modality in real dynamic scenarios. The rapid development of remote sensing technology enables collaborative observation of multimodal images, but it is challenging for uni-modal image-specific methods to overcome modal discrepancy and achieve complementary advantage detection. To this end, we propose a bidirectional diffusion guided collaborative change detection model (Bi-DiffCD) for arbitrary-modal images, which eliminates the modal discrepancy between arbitrary-modal images through the bidirectional diffusion and makes full use of the multilevel complementary advantage features to improve the detection accuracy. Specifically, a conditional diffusion-based bidirectional modal alignment module (CDBMA) is designed to step-wise align the modal attribute bidirectionally while preserving the multimodal complementary features. Furthermore, a multilevel complementary feature collaborative change detection module (MLCCD) is proposed to collaborate the multilevel enhanced complementary change information from transformed images and potential features for change detection. Experiments have been conducted on three widely used and one self-made multimodal datasets to demonstrate the effectiveness of the proposed method with different combinations of modalities. Code is available at https://github.com/Jiahuiqu/Bi-DiffCD.
Keywords:
Computer Vision: CV: Recognition (object detection, categorization)
Computer Vision: CV: Multimodal learning