Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection

Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection

Xue Lin, Qi Zou, Xixia Xu

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 1104-1110. https://doi.org/10.24963/ijcai.2020/154

Human-object interaction (HOI) detection is important to understand human-centric scenes and is challenging due to subtle difference between fine-grained actions, and multiple co-occurring interactions. Most approaches tackle the problems by considering the multi-stream information and even introducing extra knowledge, which suffer from a huge combination space and the non-interactive pair domination problem. In this paper, we propose an Action-Guided attention mining and Relation Reasoning (AGRR) network to solve the problems. Relation reasoning on human-object pairs is performed by exploiting contextual compatibility consistency among pairs to filter out the non-interactive combinations. To better discriminate the subtle difference between fine-grained actions, an action-aware attention based on class activation map is proposed to mine the most relevant features for recognizing HOIs. Extensive experiments on V-COCO and HICO-DET datasets demonstrate the effectiveness of the proposed model compared with the state-of-the-art approaches.
Keywords:
Computer Vision: Action Recognition
Computer Vision: Structural and Model-Based Approaches, Knowledge Representation and Reasoning
Machine Learning: Relational Learning
Machine Learning: Deep Learning: Convolutional networks