Enhancing Multimodal Model Robustness Under Missing Modalities via Memory-Driven Prompt Learning

Enhancing Multimodal Model Robustness Under Missing Modalities via Memory-Driven Prompt Learning

Yihan Zhao, Wei Xi, Xiao Fu, Jizhong Zhao

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 2458-2466. https://doi.org/10.24963/ijcai.2025/274

Existing multimodal models typically assume the availability of all modalities, leading to significant performance degradation when certain modalities are missing. Recent methods have introduced prompt learning to adapt pretrained models to incomplete data, achieving remarkable performance when the missing cases are consistent during training and inference. However, these methods rely heavily on distribution consistency and fail to compensate for missing modalities, limiting their ability to generalize to unseen missing cases. To address this issue, we propose Memory-Driven Prompt Learning, a framework that adaptively compensates for missing modalities through prompt learning. The compensation strategies are achieved by two types of prompts: generative prompts and shared prompts. Generative prompts retrieve semantically similar samples from a predefined prompt memory that stores modality-specific semantic information, while shared prompts leverage available modalities to provide cross-modal compensation. Extensive experiments demonstrate the effectiveness of the proposed model, achieving significant improvements across diverse missing-modality scenarios, with average performance increasing from 34.76% to 40.40% on MM-IMDb, 62.71% to 77.06% on Food101, and 60.40% to 62.77% on Hateful Memes. The code is available at https://github.com/zhao-yh20/MemPrompt.
Keywords:
Computer Vision: CV: Multimodal learning
Machine Learning: ML: Multi-modal learning