Spatial-Spectral Similarity-Guided Fusion Network for Pansharpening

Spatial-Spectral Similarity-Guided Fusion Network for Pansharpening

Jiazhuang Xiong, Yongshan Zhang, Xinxin Wang, Lefei Zhang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 2089-2097. https://doi.org/10.24963/ijcai.2025/233

Pansharpening fuses lower-resolution multispectral (LRMS) images with high-resolution panchromatic (PAN) images to generate high-resolution multispectral (HRMS) images that preserves both spatial and spectral information. Most deep pansharpening methods face challenges in cross-modal feature extraction and fusion, as well as in exploring the similarities between the fused image and both PAN and LRMS images. In this paper, we propose a spatial-spectral similarity-guided fusion network (S3FNet) for pansharpening. This architecture is composed of three parts. Specifically, a shallow feature extraction layer learns initial spatial, spectral and fused features from PAN and LRMS images. Then, a multi-branch asymmetric encoder, consisting of spatial, spectral and fusion branches, generates corresponding high-level features at different scales. A multi-scale reconstruction decoder, equipped with a well-designed cross-feature multi-head attention fusion block, processes the intermediate feature maps to generate HRMS images. To ensure HRMS images retain maximum spatial and spectral information, a similarity-constrained loss is defined for network training. Extensive experiments demonstrate the effectiveness of our S3FNet over state-of-the-art methods. The code is released at https://github.com/ZhangYongshan/S3FNet.
Keywords:
Computer Vision: CV: Low-level Vision
Computer Vision: CV: Multimodal learning