MATCH: Modality-Calibrated Hypergraph Fusion Network for Conversational Emotion Recognition

MATCH: Modality-Calibrated Hypergraph Fusion Network for Conversational Emotion Recognition

Jiandong Shi, Ming Li, Lu Bai, Feilong Cao, Ke Lu, Jiye Liang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 6164-6172. https://doi.org/10.24963/ijcai.2025/686

Multimodal emotion recognition aims to identify emotions by integrating multimodal features derived from spoken utterances. However, existing work often neglects the calibration of conversational entities, focusing mainly on extracting potential intra- or cross-modal information. This leads to the underutilization of utterance information that is essential for accurately characterizing emotion. Additionally, the lack of effective modeling of conversational patterns limits the ability to capture emotional pathways across contexts, modalities and speakers, impacting the overall emotional understanding. In this study, we propose the modality-calibrated hypergraph fusion network (MATCH), which leverages multimodal fusion and hypergraph learning techniques to address these challenges. In particular, we introduce an entity calibration strategy that refines the representations of conversational entities both at the modality and context levels, allowing for deeper insights into emotion-related cues. Furthermore, we present an emotion-aligned hypergraph fusion method that incorporates a line graph to explore conversational patterns, facilitating flexible knowledge transfer across modalities through hyperedge-level and graph-level alignments. Experiments demonstrate that MATCH outperforms state-of-the-art approaches on two benchmark datasets.
Keywords:
Machine Learning: ML: Applications
Machine Learning: ML: Multi-modal learning
Machine Learning: ML: Representation learning
Machine Learning: ML: Sequence and graph learning