Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition

Yaohui Zhu; Chenlong Liu; Shuqiang Jiang

doi:10.24963/ijcai.2020/152

Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition

Yaohui Zhu, Chenlong Liu, Shuqiang Jiang

Short video

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 1090-1096. https://doi.org/10.24963/ijcai.2020/152

PDF BibTeX

The goal of few-shot image recognition is to distinguish different categories with only one or a few training samples. Previous works of few-shot learning mainly work on general object images. And current solutions usually learn a global image representation from training tasks to adapt novel tasks. However, fine-gained categories are distinguished by subtle and local parts, which could not be captured by global representations effectively. This may hinder existing few-shot learning approaches from dealing with fine-gained categories well. In this work, we propose a multi-attention meta-learning (MattML) method for few-shot fine-grained image recognition (FSFGIR). Instead of using only base learner for general feature learning, the proposed meta-learning method uses attention mechanisms of the base learner and task learner to capture discriminative parts of images. The base learner is equipped with two convolutional block attention modules (CBAM) and a classifier. The two CBAM can learn diverse and informative parts. And the initial weights of classifier are attended by the task learner, which gives the classifier a task-related sensitive initialization. For adaptation, the gradient-based meta-learning approach is employed by updating the parameters of two CBAM and the attended classifier, which facilitates the updated base learner to adaptively focus on discriminative parts. We experimentally analyze the different components of our method, and experimental results on four benchmark datasets demonstrate the effectiveness and superiority of our method.

Keywords:

Computer Vision: Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation