Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks

Zhou Zhao; Zhu Zhang; Shuwen Xiao; Zhou Yu; Jun Yu; Deng Cai; Fei Wu; Yueting Zhuang

Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks

Zhou Zhao, Zhu Zhang, Shuwen Xiao, Zhou Yu, Jun Yu, Deng Cai, Fei Wu, Yueting Zhuang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

Main track. Pages 3683-3689. https://doi.org/10.24963/ijcai.2018/512

PDF BibTeX

Open-ended long-form video question answering is challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced long-form video content according to the question. However, the existing video question answering works mainly focus on the short-form video question answering, due to the lack of modeling the semantic representation of long-form video contents. In this paper, we consider the problem of long-form video question answering from the viewpoint of adaptive hierarchical reinforced encoder-decoder network learning. We propose the adaptive hierarchical encoder network to learn the joint representation of the long-form video contents according to the question with adaptive video segmentation. we then develop the reinforced decoder network to generate the natural language answer for open-ended video question answering. We construct a large-scale long-form video question answering dataset. The extensive experiments show the effectiveness of our method.

Keywords:

Machine Learning: Data Mining

Machine Learning Applications: Applications of Supervised Learning