Progressive Modality-Adaptive Interactive Network for Multi-Modality Image Fusion
Progressive Modality-Adaptive Interactive Network for Multi-Modality Image Fusion
Chaowei Huang, Yaru Su, Huangbiao Xu, Xiao Ke
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 1161-1169.
https://doi.org/10.24963/ijcai.2025/130
Multi-modality image fusion (MMIF) integrates features from distinct modalities to enhance visual quality and improve downstream task performance. However, existing methods often overlook the sparsity variations and dynamic correlations between infrared and visible images, potentially limiting the utilization of both modalities. To address these challenges, we propose the Progressive Modality-Adaptive Interactive Network (PoMAI), a novel framework that not only dynamically adapts to the sparsity and structural disparities of each modality but also enhances inter-modal correlations, thereby optimizing fusion quality. The training process consists of two stages: in the first stage, the Neighbor-Group Matching Model (NGMM) models the high sparsity of infrared features, while the Context-Aware Modeling Network (CAMN) captures rich structural details in visible features, jointly refining modality-specific characteristics for fusion. In the second stage, the Modality-Interactive Compensation Module (MICM) refines inter-modal correlations via dynamic compensation mechanism, while freezing the first-stage modules to focus MICM solely on the compensation task. Extensive experiments on benchmark datasets demonstrate that PoMAI surpasses state-of-the-art methods in fusion quality and excels in downstream tasks.
Keywords:
Computer Vision: CV: Low-level Vision
Computer Vision: CV: Multimodal learning
Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning
