WALKING WALKing walking: Action Recognition from Action Echoes

WALKING WALKing walking: Action Recognition from Action Echoes

Qianli Ma, Lifeng Shen, Enhuan Chen, Shuai Tian, Jiabing Wang, Garrison W. Cottrell

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 2457-2463. https://doi.org/10.24963/ijcai.2017/342

Recognizing human actions represented by 3D trajectories of skeleton joints is a challenging machine learning task. In this paper, the 3D skeleton sequences are regarded as multivariate time series, and their dynamics and multiscale features are efficiently learned from action echo states. Specifically, first the skeleton data from the limbs and trunk are projected into five high dimensional nonlinear spaces, that are randomly generated by five dynamic, training-free recurrent networks, i.e., the reservoirs of echo state networks (ESNs). In this way, the history of the time series is represented as nonlinear echo states of actions. We then use a single multiscale convolutional layer to extract multiscale features from the echo states, and maintain multiscale temporal invariance by a max-over-time pooling layer. We propose two multi-step fusion strategies to integrate the spatial information over the five parts of the human physical structure. Finally, we learn the label distribution using softmax. With one training-free recurrent layer and only layer of convolution, our Convolutional Echo State Network (ConvESN) is a very efficient end-to-end model, and achieves state-of-the-art performance on four skeleton benchmark data sets.
Keywords:
Machine Learning: Neural Networks
Machine Learning: Time-series/Data Streams
Robotics and Vision: Vision and Perception