Hierarchical Instance Feature Alignment for 2D Image-Based 3D Shape Retrieval

Heyu Zhou; Weizhi Nie; Wenhui Li; Dan Song; An-An Liu

doi:10.24963/ijcai.2020/117

Hierarchical Instance Feature Alignment for 2D Image-Based 3D Shape Retrieval

Heyu Zhou, Weizhi Nie, Wenhui Li, Dan Song, An-An Liu

Long video

Short video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 839-845. https://doi.org/10.24963/ijcai.2020/117

PDF BibTeX

2D image-based 3D shape retrieval has become a hot research topic since its wide industrial applications and academic significance. However, existing view-based 3D shape retrieval methods are restricted by two settings, 1) learn the common-class features while neglecting the instance visual characteristics, 2) narrow the global domain variations while ignoring the local semantic variations in each category. To overcome these problems, we propose a novel hierarchical instance feature alignment (HIFA) method for this task. HIFA consists of two modules, cross-modal instance feature learning and hierarchical instance feature alignment. Specifically, we first use CNN to extract both 2D image and multi-view features. Then, we maximize the mutual information between the input data and the high-level feature to preserve as much as visual characteristics of an individual instance. To mix up the features in two domains, we enforce feature alignment considering both global domain and local semantic levels. By narrowing the global domain variations we impose the identical large norm restriction on both 2D and 3D feature-norm expectations to facilitate more transferable possibility. By narrowing the local variations we propose to minimize the distance between two centroids of the same class from different domains to obtain semantic consistency. Extensive experiments on two popular and novel datasets, MI3DOR and MI3DOR-2, validate the superiority of HIFA for 2D image-based 3D shape retrieval task.

Keywords:

Computer Vision: 2D and 3D Computer Vision

Computer Vision: Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation

Machine Learning: Transfer, Adaptation, Multi-task Learning