VS-Boost: Boosting Visual-Semantic Association for Generalized  Zero-Shot Learning

Xiaofan Li; Yachao Zhang; Shiran Bian; Yanyun Qu; Yuan Xie; Zhongchao Shi; Jianping Fan

doi:10.24963/ijcai.2023/123

VS-Boost: Boosting Visual-Semantic Association for Generalized Zero-Shot Learning

Xiaofan Li, Yachao Zhang, Shiran Bian, Yanyun Qu, Yuan Xie, Zhongchao Shi, Jianping Fan

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Main Track. Pages 1107-1115. https://doi.org/10.24963/ijcai.2023/123

PDF BibTeX

Unlike conventional zero-shot learning (CZSL) which only focuses on the recognition of unseen classes by using the classifier trained on seen classes and semantic embeddings, generalized zero-shot learning (GZSL) aims at recognizing both the seen and unseen classes, so it is more challenging due to the extreme training imbalance. Recently, some feature generation methods introduce metric learning to enhance the discriminability of visual features. Although these methods achieve good results, they focus only on metric learning in the visual feature space to enhance features and ignore the association between the feature space and the semantic space. Since the GZSL method uses semantics as prior knowledge to migrate visual knowledge to unseen classes, the consistency between visual space and semantic space is critical. To this end, we propose relational metric learning which can relate the metrics in the two spaces and make the distribution of the two spaces more consistent. Based on the generation method and relational metric learning, we proposed a novel GZSL method, termed VS-Boost, which can effectively boost the association between vision and semantics. The experimental results demonstrate that our method is effective and achieves significant gains on five benchmark datasets compared with the state-of-the-art methods.

Keywords:

Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning

Computer Vision: CV: Neural generative models, auto encoders, GANs

Computer Vision: CV: Recognition (object detection, categorization)