SE(3)-Equivariant Diffusion Models for 3D Object Analysis

SE(3)-Equivariant Diffusion Models for 3D Object Analysis

Xie Min, Zhao Jieyu, Shen Kedi, Chen Kangxin

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 1738-1746. https://doi.org/10.24963/ijcai.2025/194

SE(3)-equivariance is a critical property for capturing pose information in 3D vision tasks, enabling models to handle transformations such as rotations and translations effectively. While equivariant diffusion models have recently demonstrated promise in 3D object reassembly due to their generative and denoising capabilities, they face key challenges when applied to this task. Specifically, traditional diffusion models rely on fixed input sizes, which limits their adaptability to varying part quantities, and their linear noise addition and removal processes struggle to address the inherently nonlinear transformations of 3D parts. To overcome these limitations, this paper proposes an SE(3)-equivariant diffusion model for pose denoising and 3D object reassembly from fragmented parts. The model incorporates an equivariant encoder to extract SE(3)-equivariant features, a Lie algebra mapping to linearize noise addition and removal, and an elastic diffusion framework capable of adapting to varying part quantities and nonlinear transformations. By leveraging these components, the method achieves accurate and robust pose predictions across diverse input configurations. Experiments conducted on the Breaking Bad dataset, a real-world RePAIR and a self-constructed 3D mannequin dataset demonstrate the effectiveness of the proposed model, outperforming state-of-the-art methods across metrics such as root mean square error and part accuracy. Ablation studies further validate the critical contributions of key modules, emphasizing their roles in improving accuracy and robustness for 3D part reassembly tasks.
Keywords:
Computer Vision: CV: 3D computer vision
Computer Vision: CV: Biometrics, face, gesture and pose recognition
Computer Vision: CV: Machine learning for vision
Machine Learning: ML: Generative models