DBDNet: Learning Bi-directional Dynamics for Early Action Prediction

DBDNet: Learning Bi-directional Dynamics for Early Action Prediction

Guoliang Pang, Xionghui Wang, Jian-Fang Hu, Qing Zhang, Wei-Shi Zheng

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 897-903. https://doi.org/10.24963/ijcai.2019/126

Predicting future actions from observed partial videos is very challenging as the missing future is uncertain and sometimes has multiple possibilities. To obtain a reliable future estimation, a novel encoder-decoder architecture is proposed for integrating the tasks of synthesizing future motions from observed videos and reconstructing observed motions from synthesized future motions in an unified framework, which can capture the bi-directional dynamics depicted in partial videos along the temporal (past-to-future) direction and reverse chronological (future-back-to-past) direction. We then employ a bi-directional long short-term memory (Bi-LSTM) architecture to exploit the learned bi-directional dynamics for predicting early actions. Our experiments on two benchmark action datasets show that learning bi-directional dynamics benefits the early action prediction and our system clearly outperforms the state-of-the-art methods.
Keywords:
Computer Vision: Action Recognition
Computer Vision: Video: Events, Activities and Surveillance
Computer Vision: Computer Vision