Multimodal Fake News Detection: MFND Dataset and Shallow-Deep Multitask Learning

Multimodal Fake News Detection: MFND Dataset and Shallow-Deep Multitask Learning

Ye Zhu, Yunan Wang, Zitong Yu

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 8012-8020. https://doi.org/10.24963/ijcai.2025/891

Multimodal news contains a wealth of information and is easily affected by deepfake modeling attacks. To combat the latest image and text generation methods, we present a new Multimodal Fake News Detection dataset (MFND) containing 11 manipulated types, designed to detect and localize highly authentic fake news. Furthermore, we propose a Shallow-Deep Multitask Learning (SDML) model for fake news, which fully uses unimodal and mutual modal features to mine the intrinsic semantics of news. Under shallow inference, we propose the momentum distillation-based light punishment contrastive learning for fine-grained uniform spatial image and text semantic alignment, and an adaptive cross-modal fusion module to enhance mutual modal features. Under deep inference, we design a two-branch framework to augment the image and text unimodal features, respectively merging with mutual modalities features, for four predictions via dedicated detection and localization projections. Experiments on both mainstream and our proposed datasets demonstrate the superiority of the model. Codes and dataset are released at https://github.com/yunan-wang33/sdml.
Keywords:
Multidisciplinary Topics and Applications: MTA: Security and privacy
Computer Vision: CV: Multimodal learning
Multidisciplinary Topics and Applications: MTA: News and media