Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning

Yaohua Zha; Tao Dai; Hang Guo; Yanzi Wang; Bin Chen; Ke Chen; Shu-Tao Xia

doi:10.24963/ijcai.2025/260

Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning

Yaohua Zha, Tao Dai, Hang Guo, Yanzi Wang, Bin Chen, Ke Chen, Shu-Tao Xia

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 2332-2340. https://doi.org/10.24963/ijcai.2025/260

PDF BibTeX

Point clouds, as a primary representation of 3D data, can be categorized into scene domain point clouds and object domain point clouds. Point cloud self-supervised learning (SSL) has become a mainstream paradigm for learning 3D representations. However, existing point cloud SSL primarily focuses on learning domain-specific 3D representations within a single domain, neglecting the complementary nature of cross-domain knowledge, which limits the learning of 3D representations. In this paper, we propose to learn a comprehensive Point cloud Mixture-of-Domain-Experts model (Point-MoDE) via a block-to-scene pre-training strategy. Specifically, We first propose a mixture-of-domain-expert model consisting of scene domain experts and multiple shared object domain experts. Furthermore, we propose a block-to-scene pretraining strategy, which leverages the features of point blocks in the object domain to regress their initial positions in the scene domain through object-level block mask reconstruction and scene-level block position regression. By integrating the complementary knowledge between object and scene, this strategy simultaneously facilitates the learning of both object-domain and scene-domain representations, leading to a more comprehensive 3D representation. Extensive experiments in downstream tasks demonstrate the superiority of our model.

Keywords:

Computer Vision: CV: 3D computer vision

Computer Vision: CV: Recognition (object detection, categorization)