High Performance Gesture Recognition via Effective and Efficient Temporal Modeling

High Performance Gesture Recognition via Effective and Efficient Temporal Modeling

Yang Yi, Feng Ni, Yuexin Ma, Xinge Zhu, Yuankai Qi, Riming Qiu, Shijie Zhao, Feng Li, Yongtao Wang

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 1003-1009. https://doi.org/10.24963/ijcai.2019/141

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.
Keywords:
Computer Vision: Biometrics, Face and Gesture Recognition
Computer Vision: Action Recognition