Egocentric Object-Interaction Anticipation with Retentive and Predictive Learning

Guo Chen; Yifei Huang; Yin-dong Zheng; Yicheng Liu; Jiahao Wang; Tong Lu

doi:10.24963/ijcai.2025/88

Egocentric Object-Interaction Anticipation with Retentive and Predictive Learning

Guo Chen, Yifei Huang, Yin-dong Zheng, Yicheng Liu, Jiahao Wang, Tong Lu

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 783-791. https://doi.org/10.24963/ijcai.2025/88

PDF BibTeX

Egocentric object-interaction anticipation is critical for applications like augmented reality and robotics, but existing methods struggle with misaligned egocentric encoding, insufficient supervision, and underutilized historical context. These limitations stem from a lack of focus on retention, i.e., retaining long-term object-centric interactions, and prediction, i.e., future-centric encoding and future uncertainty modeling. We introduce EgoAnticipator, a novel Retentive and Predictive Learning framework that addresses these challenges. Our approach combines retentive pre-training for domain-specific encoding, predictive pre-training for future uncertainty modeling, and mirror distillation to transfer future-informed knowledge. Additionally, we propose long-term memory prompting to integrate historical interaction cues. We evaluate the effectiveness of our framework using the Ego4D short-term object interaction anticipation benchmark, covering both STAv1 and STAv2. Extensive experiments demonstrate that our framework outperforms existing methods, while ablation studies highlight the effectiveness of each design inside our retentive and predictive learning framework.

Keywords:

Computer Vision: CV: Video analysis and understanding