MMGIA: Gradient Inversion Attack Against Multimodal Federated Learning via Intermodal Correlation

Lele Zheng; Yang Cao; Leo Yu Zhang; Wei Wang; Yulong Shen; Xiaochun Cao

doi:10.24963/ijcai.2025/886

MMGIA: Gradient Inversion Attack Against Multimodal Federated Learning via Intermodal Correlation

Lele Zheng, Yang Cao, Leo Yu Zhang, Wei Wang, Yulong Shen, Xiaochun Cao

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 7967-7975. https://doi.org/10.24963/ijcai.2025/886

PDF BibTeX

Multimodal federated learning (MMFL) enables collaborative model training across multiple modalities, such as images and text, without requiring direct data sharing. However, the inherent correlations between modalities introduce new privacy vulnerabilities, making MMFL more susceptible to gradient inversion attacks. In this work, we propose MMGIA, an intermodal correlation-driven gradient inversion attack that systematically exploits multimodal correlation to enhance data reconstruction quality. MMGIA consists of a two-stage optimization framework: the first stage independently reconstructs each modality using traditional gradient inversion techniques, while the second stage refines these reconstructions through pre-trained feature extractors to align modalities in a shared latent space. To further improve reconstruction accuracy, we introduce a quality-weighted fusion strategy, which dynamically integrates multimodal embeddings into a global fused representation that serves as a guiding signal for refining each modality’s reconstruction. This ensures that high-quality reconstructions contribute more to the optimization process, preventing degradation in well-reconstructed modalities while enhancing weaker ones. We conduct extensive experiments on multiple multimodal scenarios, demonstrating that MMGIA outperforms both the only existing multimodal attack and state-of-the-art single-modal attacks, revealing the heightened privacy risks in MMFL.

Keywords:

Multidisciplinary Topics and Applications: MTA: Security and privacy

Machine Learning: ML: Federated learning