VideoMaster: A Multimodal Micro Game Video Recreator

VideoMaster: A Multimodal Micro Game Video Recreator

Yipeng Yu, Xiao Chen, Hui Zhan

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Demo Track. Pages 7179-7182. https://doi.org/10.24963/ijcai.2023/844

To free human from laborious video production, this paper proposes the building of VideoMaster, a multimodal system equipped with four capabilities: highlight extraction, video describing, video dubbing and video editing. It extracts interesting episodes from long game videos, generates subtitles for each episode, reads the subtitles through synthesized speech, and finally re-creates a better short video through video editing. Notably, VideoMaster takes a combination of deep learning and traditional computer vision techniques to extract highlights with fine-to-coarse labels, utilizes a novel framework named PCSG-v (probabilistic context sensitive grammar for video) for video description generation, and imitates a target speaker's voice to read the description. To the best of our knowledge, VideoMaster is the first multimedia system that can automatically produce product-level micro-videos without heavy human annotation.
Keywords:
Multidisciplinary Topics and Applications: MDA: Arts and creativity
Computer Vision: CV: Applications
Computer Vision: CV: Segmentation
Computer Vision: CV: Video analysis and understanding   
Computer Vision: CV: Vision and language 
Machine Learning: ML: Multi-modal learning
Natural Language Processing: NLP: Applications
Natural Language Processing: NLP: Language generation
Natural Language Processing: NLP: Speech