Understanding Matters: Semantic-Structural Determined Visual Relocalization for Large Scenes
Understanding Matters: Semantic-Structural Determined Visual Relocalization for Large Scenes
Jingyi Nie, Liangliang Cai, Qichuan Geng, Zhong Zhou
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 8759-8767.
https://doi.org/10.24963/ijcai.2025/974
Scene Coordinate Regression (SCR) estimates 3D scene coordinates from 2D images, and has become an important approach in visual relocalization. Existing methods exhibit high localization accuracy in small scenes, but still face substantial challenges in large-scale scenes, which usually have significant variations in depth, scale, and occlusion. Although structure-guided scene partitioning is commonly adopted, the over-partitioned elements and large feature variances within subscenes impede the estimation of the 3D coordinates, introducing misleading information for subsequent processing. To address the above-mentioned issues, we propose the Semantic-Structural Determined Visual Relocalization method for SCR, which leverages semantic-structural partition learning and partition-determined pose refinement to better understand the semantic and structural information on large scenes. Firstly, we partition the scene into small subscenes with label assignments, ensuring semantic consistency and structural continuity within each subscene. A classifier is then trained with sampling-based learning to predict these labels. Secondly, the partition predictions are encoded into embeddings and integrated with local features for intra-class compactness and inter-class separation, producing partition-aware features. To further decrease feature variances, we employ a discriminability metric and suppress ambiguous points, improving subsequent computations. Experimental results on the Cambridge Landmarks dataset demonstrate that the proposed method achieves significant improvements with fewer training costs on large-scale scenes, reducing the median error by 38% compared to the state-of-the-art SCR method DSAC*. Code is available: https://gitee.com/VR_NAVE/ss-dvr.
Keywords:
Robotics: ROB: Localization, mapping, state estimation
Computer Vision: CV: 3D computer vision
Computer Vision: CV: Scene analysis and understanding
Robotics: ROB: Robotics and vision
