GSDNet: Revisiting Incomplete Multimodality-Diffusion Emotion Recognition from the Perspective of Graph Spectrum
GSDNet: Revisiting Incomplete Multimodality-Diffusion Emotion Recognition from the Perspective of Graph Spectrum
Yuntao Shou, Jun Yao, Tao Meng, Wei Ai, Cen Chen, Keqin Li
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 6182-6190.
https://doi.org/10.24963/ijcai.2025/688
Multimodal Emotion Recognition (MER) combines technologies from multiple fields (e.g., computer vision, natural language processing, and audio signal processing), aiming to infer an individual's emotional state by analyzing information from different sources (i.e., video, audio, and text). Compared with single modality, by fusing complementary semantic information from different modalities, the model can obtain more robust knowledge representation. However, the modality missing problem limits the performance of MERC in practical scenarios. Recent work has achieved impressive performance on modality completion using graph neural networks and diffusion models, respectively. This inspires us to combine these two dimensions in the completion network to obtain more powerful representation capabilities. However, we argue that directly running a full-rank score-based diffusion model on the entire graph adjacency matrix space may adversely affect the learning process of the diffusion model. This is because the model assumes a direct relationship between each pair of nodes and ignores local structural features and sparse connections between nodes, thereby significantly reducing the quality of the generated data. Based on the above ideas, we propose a novel Graph Spectral Diffusion Network (GSDNet), which utilizes a low-rank score-based diffusion model to map Gaussian noise to the graph spectral distribution space of missing modalities and recover the missing data according to its original distribution. Extensive experiments have demonstrated that GSDNet achieves state-of-the-art emotion recognition performance in various modality loss scenarios.
Keywords:
Machine Learning: ML: Multi-modal learning
Machine Learning: ML: Sequence and graph learning
