Predicting the Visual Focus of Attention in Multi-Person Discussion Videos

Predicting the Visual Focus of Attention in Multi-Person Discussion Videos

Chongyang Bai, Srijan Kumar, Jure Leskovec, Miriam Metzger, Jay F. Nunamaker, V. S. Subrahmanian

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 4504-4510. https://doi.org/10.24963/ijcai.2019/626

Visual focus of attention in multi-person discussions is a crucial nonverbal indicator in tasks such as inter-personal relation inference, speech transcription, and deception detection. However, predicting the focus of attention remains a challenge because the focus changes rapidly, the discussions are highly dynamic, and the people's behaviors are inter-dependent. Here we propose ICAF (Iterative Collective Attention Focus), a collective classification model to jointly learn the visual focus of attention of all people. Every person is modeled using a separate classifier. ICAF models the people collectively---the predictions of all other people's classifiers are used as inputs to each person's classifier. This explicitly incorporates inter-dependencies between all people's behaviors. We evaluate ICAF on a novel dataset of 5 videos (35 people, 109 minutes, 7604 labels in all) of the popular Resistance game and a widely-studied meeting dataset with supervised prediction. See our demo at https://cs.dartmouth.edu/dsail/demos/icaf. ICAF outperforms the strongest baseline by 1%--5% accuracy in predicting the people's visual focus of attention. Further, we propose a lightly supervised technique to train models in the absence of training labels. We show that light-supervised ICAF performs similar to the supervised ICAF, thus showing its effectiveness and generality to previously unseen videos.
Keywords:
Machine Learning Applications: Applications of Supervised Learning
Machine Learning Applications: Applications of Unsupervised Learning