Probabilistic Multimodal Learning with von Mises-Fisher Distributions

Probabilistic Multimodal Learning with von Mises-Fisher Distributions

Peng Hu, Yang Qin, Yuanbiao Gou, Yunfan Li, Mouxing Yang, Xi Peng

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 5390-5398. https://doi.org/10.24963/ijcai.2025/600

Multimodal learning is pivotal for the advancement of artificial intelligence, enabling machines to integrate complementary information from diverse data sources for holistic perception and understanding. Despite significant progress, existing methods struggle with challenges such as noisy inputs, noisy correspondence, and the inherent uncertainty of multimodal data, limiting their reliability and robustness. To address these issues, this paper presents a novel Probabilistic Multimodal Learning framework (PML) that models each data point as a von Mises-Fisher (vMF) distribution, effectively capturing intrinsic uncertainty and enabling robust fusion. Unlike traditional Gaussian-based models, PML learns directional representation with a concentration parameter to quantify reliability directly, enhancing stability and interpretability. To enhance discrimination, we propose a von Mises-Fisher Prototypical Contrastive Learning paradigm (vMF-PCL), which projects data onto a hypersphere by pulling within-class samples closer to their class prototype while pushing between-class prototypes apart, adaptively learning the reliability estimations. Building upon the estimated reliability, we develop a Reliable Multimodal Fusion mechanism (RMF) that dynamically adjusts the contribution and conflict of each modality, ensuring robustness against noisy data, noisy correspondence, and uncertainty. Extensive experiments on nine benchmarks demonstrate the superiority of PML, consistently outperforming 14 state-of-the-art methods. Code is available at https://github.com/XLearning-SCU/2025-IJCAI-PML.
Keywords:
Machine Learning: ML: Multi-modal learning
Computer Vision: CV: Multimodal learning
Machine Learning: ML: Multi-view learning
Machine Learning: ML: Supervised Learning