Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation

Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation

Niaz Ahmad, Jawad Khan, Kang G. Shin, Youngmoon Lee, Guanghui Wang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 665-673. https://doi.org/10.24963/ijcai.2025/75

The dynamic movement of the human body presents a fundamental challenge for human pose estimation and body segmentation. State-of-the-art approaches primarily rely on combining keypoint heatmaps with segmentation masks, but often struggle in scenarios involving overlapping joints during pose estimation or rapidly changing poses for instance-level segmentation. To address these limitations, we leverage Keypoints as Dynamic Centroid (KDC), a new centroid-based representation for unified human pose estimation and instance-level segmentation. KDC adopts a bottom-up paradigm to generate keypoint heatmaps for easily distinguishable and complex keypoints, and improves keypoint detection and confidence scores by introducing KeyCentroids using a keypoint disk. It leverages high-confidence keypoints as dynamic centroids in the embedding space to generate MaskCentroids, allowing for the swift clustering of pixels to specific human instances during rapid changes in human body movements in a live environment. Our experimental evaluations focus on crowded and occluded cases using the CrowdPose, OCHuman, and COCO benchmarks, demonstrating KDC’s effectiveness and generalizability in challenging scenarios in terms of both accuracy and runtime performance. Our implementation is available at https://sites.google.com/view/niazahmad/projects/kdc.
Keywords:
Computer Vision: CV: Biometrics, face, gesture and pose recognition
Computer Vision: CV: Action and behavior recognition
Computer Vision: CV: Segmentation, grouping and shape analysis
Computer Vision: CV: Video analysis and understanding