Collaborative Learning of Depth Estimation, Visual Odometry and Camera Relocalization from Monocular Videos

Haimei Zhao; Wei Bian; Bo Yuan; Dacheng Tao

doi:10.24963/ijcai.2020/68

Collaborative Learning of Depth Estimation, Visual Odometry and Camera Relocalization from Monocular Videos

Haimei Zhao, Wei Bian, Bo Yuan, Dacheng Tao

Short video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 488-494. https://doi.org/10.24963/ijcai.2020/68

PDF BibTeX

Scene perceiving and understanding tasks including depth estimation, visual odometry (VO) and camera relocalization are fundamental for applications such as autonomous driving, robots and drones. Driven by the power of deep learning, significant progress has been achieved on individual tasks but the rich correlations among the three tasks are largely neglected. In previous studies, VO is generally accurate in local scope yet suffers from drift in long distances. By contrast, camera relocalization performs well in the global sense but lacks local precision. We argue that these two tasks should be strategically combined to leverage the complementary advantages, and be further improved by exploiting the 3D geometric information from depth data, which is also beneficial for depth estimation in turn. Therefore, we present a collaborative learning framework, consisting of DepthNet, LocalPoseNet and GlobalPoseNet with a joint optimization loss to estimate depth, VO and camera localization unitedly. Moreover, the Geometric Attention Guidance Model is introduced to exploit the geometric relevance among three branches during learning. Extensive experiments demonstrate that the joint learning scheme is useful for all tasks and our method outperforms current state-of-the-art techniques in depth estimation and camera relocalization with highly competitive performance in VO.

Keywords:

Computer Vision: 2D and 3D Computer Vision

Robotics: Localization, Mapping, State Estimation