Visual Emotion Representation Learning via Emotion-Aware Pre-training

Yue Zhang; Wanying Ding; Ran Xu; Xiaohua Hu

doi:10.24963/ijcai.2022/234

Visual Emotion Representation Learning via Emotion-Aware Pre-training

Yue Zhang, Wanying Ding, Ran Xu, Xiaohua Hu

Watch video

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

Main Track. Pages 1679-1685. https://doi.org/10.24963/ijcai.2022/234

PDF BibTeX

Despite recent progress in deep learning, visual emotion recognition remains a challenging problem due to ambiguity of emotion perception, diverse concepts related to visual emotion and lack of large-scale annotated dataset. In this paper, we present a large-scale multimodal pre-training method to learn visual emotion representation by aligning emotion, object, attribute triplet with a contrastive loss. We conduct our pre-training on a large web dataset with noisy tags and fine-tune on visual emotion classification datasets. Our method achieves state-of-the-art performance for visual emotion classification.

Keywords:

Computer Vision: Vision and language

Natural Language Processing: Sentiment Analysis and Text Mining

Humans and AI: Cognitive Modeling