Self-supervised Monocular Depth and Visual Odometry Learning with Scale-consistent Geometric Constraints

Mingkang Xiong; Zhenghong Zhang; Weilin Zhong; Jinsheng Ji; Jiyuan Liu; Huilin Xiong

doi:10.24963/ijcai.2020/134

Self-supervised Monocular Depth and Visual Odometry Learning with Scale-consistent Geometric Constraints

Mingkang Xiong, Zhenghong Zhang, Weilin Zhong, Jinsheng Ji, Jiyuan Liu, Huilin Xiong

Short video

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 963-969. https://doi.org/10.24963/ijcai.2020/134

PDF BibTeX

The self-supervised learning-based depth and visual odometry (VO) estimators trained on monocular videos without ground truth have drawn significant attention recently. Prior works use photometric consistency as supervision, which is fragile under complex realistic environments due to illumination variations. More importantly, it suffers from scale inconsistency in the depth and pose estimation results. In this paper, robust geometric losses are proposed to deal with this problem. Specifically, we first align the scales of two reconstructed depth maps estimated from the adjacent image frames, and then enforce forward-backward relative pose consistency to formulate scale-consistent geometric constraints. Finally, a novel training framework is constructed to implement the proposed losses. Extensive evaluations on KITTI and Make3D datasets demonstrate that, i) by incorporating the proposed constraints as supervision, the depth estimation model can achieve state-of-the-art (SOTA) performance among the self-supervised methods, and ii) it is effective to use the proposed training framework to obtain a uniform global scale VO model.

Keywords:

Computer Vision: 2D and 3D Computer Vision

Robotics: Robotics and Vision