Crowd Scene Understanding with Coherent Recurrent Neural Networks / 3469
Hang Su, Yinpeng Dong, Jun Zhu, Haibin Ling, Bo Zhang
Exploring crowd dynamics is essential in understanding crowd scenes, which still remains as a challenging task due to the nonlinear characteristics and coherent spatio-temporal motion patterns in crowd behaviors. To address these issues, we present a Coherent Long Short Term Memory (cLSTM) network to capture the nonlinear crowd dynamics by learning an informative representation of crowd motions, which facilitates the critical tasks in crowd scene analysis. By describing the crowd motion patterns with a cloud of keypoint tracklets, we explore the nonlinear crowd dynamics embedded in the tracklets with a stacked LSTM model, which is further improved to capture the collective properties by introducing a coherent regularization term; and finally, we adopt an unsupervised encoder-decoder framework to learn a hidden feature for each input tracklet that embeds its inherent dynamics. With the learnt features properly harnessed, crowd scene understanding is conducted effectively in predicting the future paths of agents, estimating group states, and classifying crowd events. Extensive experiments on hundreds of public crowd videos demonstrate that our method is state-of-the-art performance by exploring the coherent spatio-temporal structures in crowd behaviors.