EntScene: Nonparametric Bayesian Temporal Segmentation of Videos Aimed at Entity-Driven Scene Detection / 3721
Adway Mitra, Chiranjib Bhattacharyya, Soma Biswas
In this paper, we study Bayesian techniques for entity discovery and temporal segmentation of videos. Existing temporal video segmentation techniques are based on low-level features, and are usually suitable for discovering short, homogeneous shots rather than diverse scenes, each of which contains several such shots. We define scenes in terms of semantic entities (eg. persons). This is the first attempt at entity-driven scene discovery in videos, without using meta-data like scripts. The problem is hard because we have no explicit prior information about the entities and the scenes. However such sequential data exhibit temporal coherence in multiple ways, and this provides implicit cues. To capture these, we propose a Bayesian generative model- EntScene, that represents entities with mixture components and scenes with discrete distributions over these components. The most challenging part of this approach is the inference, as it involves complex interactions of latent variables. To this end, we propose an algorithm based on Dynamic Blocked Gibbs Sampling, that attempts to jointly learn the components and the segmentation, by progressively merging an initial set of short segments. The proposed algorithm compares favourably against suitably designed baselines on several TV-series videos. We extend the method to an unexplored problem: temporal co-segmentation of videos containing same entities.