Timestamp-Supervised Action Segmentation from the Perspective of Clustering

Timestamp-Supervised Action Segmentation from the Perspective of Clustering

Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Fuchun Sun

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 690-698. https://doi.org/10.24963/ijcai.2023/77

Video action segmentation under timestamp supervision has recently received much attention due to lower annotation costs. Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model. However, these methods suffer from incorrect pseudo-labels, especially for the semantically unclear frames in the transition region between two consecutive actions, which we call ambiguous intervals. To address this issue, we propose a novel framework from the perspective of clustering, which includes the following two parts. First, pseudo-label ensembling generates incomplete but high-quality pseudo-label sequences, where the frames in ambiguous intervals have no pseudo-labels. Second, iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model. We further introduce a clustering loss, which encourages the features of frames within the same action segment more compact. Extensive experiments show the effectiveness of our method.
Keywords:
Computer Vision: CV: Video analysis and understanding   
Computer Vision: CV: Applications
Computer Vision: CV: Machine learning for vision