PredCNN: Predictive Learning with Cascade Convolutions

PredCNN: Predictive Learning with Cascade Convolutions

Ziru Xu, Yunbo Wang, Mingsheng Long, Jianmin Wang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 2940-2947. https://doi.org/10.24963/ijcai.2018/408

Predicting future frames in videos remains an unsolved but challenging problem. Mainstream recurrent models suffer from huge memory usage and computation cost, while convolutional models are unable to effectively capture the temporal dependencies between consecutive video frames. To tackle this problem, we introduce an entirely CNN-based architecture, PredCNN, that models the dependencies between the next frame and the sequential video inputs. Inspired by the core idea of recurrent models that previous states have more transition operations than future states, we design a cascade multiplicative unit (CMU) that provides relatively more operations for previous video frames. This newly proposed unit enables PredCNN to predict future spatiotemporal data without any recurrent chain structures, which eases gradient propagation and enables a fully paralleled optimization. We show that PredCNN outperforms the state-of-the-art recurrent models for video prediction on the standard Moving MNIST dataset and two challenging crowd flow prediction datasets, and achieves a faster training speed and lower memory footprint.
Keywords:
Machine Learning: Deep Learning
Computer Vision: Video: Events, Activities and Surveillance