DiffSQL: Leveraging Diffusion Model for Zero-Shot Self-Supervised Monocular Depth Estimation

Heyuan Zheng; Yunji Liang; Lei Liu; Zhiwen Yu

doi:10.24963/ijcai.2025/981

DiffSQL: Leveraging Diffusion Model for Zero-Shot Self-Supervised Monocular Depth Estimation

Heyuan Zheng, Yunji Liang, Lei Liu, Zhiwen Yu

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 8823-8831. https://doi.org/10.24963/ijcai.2025/981

PDF BibTeX

Self-supervised monocular depth estimation has attracted significant attention due to its broad applications in autonomous driving and robotics. Although significant performance improvements have been achieved by learning the relative distance of objects with the introduction of Self Query Layer (SQL), it struggles with zero-shot generalization due to the lack of geometric features and the fixed number of query sizes. To address these problems, we propose a diffusion-augmented self-supervised depth estimation framework, named DiffSQL, to learn geometric priors for feature augmentation. Additionally, we introduce a dynamic self-query layer that implicitly computes the relative distances between objects by adjusting the query size according to the feature distribution. Experimental results on the KITTI dataset show that DiffSQL outperforms SQLdepth by 1.03% in terms of AbsRel and 2.79% in terms of SqRel. Furthermore, our experiments demonstrate that DiffSQL is superior in zero-shot generalization.

Keywords:

Robotics: ROB: Robotics and vision

Computer Vision: CV: 3D computer vision

Robotics: ROB: Localization, mapping, state estimation