StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

Xiangteng He, Yuxin Peng, Junjie Zhao

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 741-747. https://doi.org/10.24963/ijcai.2018/103

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.
Keywords:
Computer Vision: Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation
Machine Learning Applications: Applications of Reinforcement Learning