Visual Emotion Representation Learning via Emotion-Aware Pre-training

Visual Emotion Representation Learning via Emotion-Aware Pre-training

Yue Zhang, Wanying Ding, Ran Xu, Xiaohua Hu

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 1679-1685. https://doi.org/10.24963/ijcai.2022/234

Despite recent progress in deep learning, visual emotion recognition remains a challenging problem due to ambiguity of emotion perception, diverse concepts related to visual emotion and lack of large-scale annotated dataset. In this paper, we present a large-scale multimodal pre-training method to learn visual emotion representation by aligning emotion, object, attribute triplet with a contrastive loss. We conduct our pre-training on a large web dataset with noisy tags and fine-tune on visual emotion classification datasets. Our method achieves state-of-the-art performance for visual emotion classification.
Keywords:
Computer Vision: Vision and languageĀ 
Natural Language Processing: Sentiment Analysis and Text Mining
Humans and AI: Cognitive Modeling