Learning to Estimate Object Poses without Real Image Annotations

Learning to Estimate Object Poses without Real Image Annotations

Haotong Lin, Sida Peng, Zhize Zhou, Xiaowei Zhou

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 1159-1165. https://doi.org/10.24963/ijcai.2022/162

This paper presents a simple yet effective approach for learning 6DoF object poses without real image annotations. Previous methods have attempted to train pose estimators on synthetic data, but they do not generalize well to real images due to the sim-to-real domain gap and produce inaccurate pose estimates. We find that, in most cases, the synthetically trained pose estimators are able to provide reasonable initialization for depth-based pose refinement methods which yield accurate pose estimates. Motivated by this, we propose a novel learning framework, which utilizes the accurate results of depth-based pose refinement methods to supervise the RGB-based pose estimator. Our method significantly outperforms previous self-supervised methods on several benchmarks. Even compared with fully-supervised methods that use real annotated data, we achieve competitive results without using any real annotation. The code is available at https://github.com/zju3dv/pvnet-depth-sup.
Keywords:
Computer Vision: 3D Computer Vision
Computer Vision: Recognition (object detection, categorization)