Hierarchical Temporal Multi-Instance Learning for Video-based Student Learning Engagement Assessment

Hierarchical Temporal Multi-Instance Learning for Video-based Student Learning Engagement Assessment

Jiayao Ma, Xinbo Jiang, Songhua Xu, Xueying Qin

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 2782-2789. https://doi.org/10.24963/ijcai.2021/383

Video-based automatic assessment of a student's learning engagement on the fly can provide immense values for delivering personalized instructional services, a vehicle particularly important for massive online education. To train such an assessor, a major challenge lies in the collection of sufficient labels at the appropriate temporal granularity since a learner's engagement status may continuously change throughout a study session. Supplying labels at either frame or clip level incurs a high annotation cost. To overcome such a challenge, this paper proposes a novel hierarchical multiple instance learning (MIL) solution, which only requires labels anchored on full-length videos to learn to assess student engagement at an arbitrary temporal granularity and for an arbitrary duration in a study session. The hierarchical model mainly comprises a bottom module and a top module, respectively dedicated to learning the latent relationship between a clip and its constituent frames and that between a video and its constituent clips, with the constraints on the training stage that the average engagements of local clips is that of the video label. To verify the effectiveness of our method, we compare the performance of the proposed approach with that of several state-of-the-art peer solutions through extensive experiments.
Keywords:
Machine Learning: Multi-instance; Multi-label; Multi-view learning
Computer Vision: Video: Events, Activities and Surveillance
Humans and AI: Computer-Aided Education