Variational Multi-Modal Hypergraph Attention Network for Multi-Modal Relation Extraction
Variational Multi-Modal Hypergraph Attention Network for Multi-Modal Relation Extraction
Qian Li, Cheng Ji, Shu Guo, Kun Peng, Qianren Mao, Shangguang Wang
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 8159-8167.
https://doi.org/10.24963/ijcai.2025/907
Multi-modal relation extraction (MMRE) is a challenging task that seeks to identify relationships between entities with textual and visual attributes. However, existing methods struggle to handle the complexities posed by multiple entity pairs within a single sentence that share similar contextual information (e.g., identical text and image content). These scenarios amplify the difficulty of distinguishing relationships and hinder accurate extraction. To address these limitations, we propose the variational multi-modal hypergraph attention network (VM-HAN), a novel and robust framework for MMRE. Unlike previous approaches, VM-HAN constructs a multi-modal hypergraph for each sentence-image pair, explicitly modeling high-order intra-/inter-modal correlations among different entity pairs in the same context. This design enables a more detailed and nuanced understanding of entity relationships by capturing intricate cross-modal interactions that are often overlooked. Additionally, we introduce the variational hypergraph attention network (V-HAN). This variational attention mechanism dynamically refines the hypergraph structure, enabling the model to effectively handle the inherent ambiguity and complexity of multi-modal data. Comprehensive experiments on benchmark MMRE datasets demonstrate that VM-HAN achieves state-of-the-art performance, significantly surpassing existing methods in both accuracy and efficiency.
Keywords:
Natural Language Processing: NLP: Information extraction
Natural Language Processing: NLP: Other
