Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching

Jianfei Yu; Jieming Wang; Rui Xia; Junjie Li

doi:10.24963/ijcai.2022/622

Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching

Jianfei Yu, Jieming Wang, Rui Xia, Junjie Li

Watch video

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

Main Track. Pages 4482-4488. https://doi.org/10.24963/ijcai.2022/622

PDF BibTeX

Targeted Multimodal Sentiment Classification (TMSC) aims to identify the sentiment polarities over each target mentioned in a pair of sentence and image. Existing methods to TMSC failed to explicitly capture both coarse-grained and fine-grained image-target matching, including 1) the relevance between the image and the target and 2) the alignment between visual objects and the target. To tackle this issue, we propose a new multi-task learning architecture named coarse-to-fine grained Image-Target Matching network (ITM), which jointly performs image-target relevance classification, object-target alignment, and targeted sentiment classification. We further construct an Image-Target Matching dataset by manually annotating the image-target relevance and the visual object aligned with the input target. Experiments on two benchmark TMSC datasets show that our model consistently outperforms the baselines, achieves state-of-the-art results, and presents interpretable visualizations.

Keywords:

Natural Language Processing: Sentiment Analysis and Text Mining