Semantic Question-Answering with Video and Eye-Tracking Data: AI Foundations for Human Visual Perception Driven Cognitive Film Studies / 2633
Jakob Suchan, Mehul Bhatt
We present a computational framework for the grounding and semantic interpretation of dynamic visuo-spatial imagery consisting of video and eye-tracking data. Driven by cognitive film studies and visual perception research, we demonstrate key technological capabilities aimed at investigating attention and recipient effects vis-a-vis the motion picture; this encompasses high-level analysis of subject's visual fixation patterns and correlating this with (deep) semantic analysis of the dynamic visual data (e.g., fixation on movie characters, influence of cinematographic devices such as cuts). The framework and its application as a general AI-based assistive technology platform — integrating vision and KR — for cognitive film studies is highlighted.