Accelerated Inference Framework of Sparse Neural Network Based on Nested Bitmask Structure

Accelerated Inference Framework of Sparse Neural Network Based on Nested Bitmask Structure

Yipeng Zhang, Bo Du, Lefei Zhang, Rongchun Li, Yong Dou

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 4355-4361. https://doi.org/10.24963/ijcai.2019/605

In order to satisfy the ever-growing demand for high-performance processors for neural networks, the state-of-the-art processing units tend to use application-oriented circuits to replace Processing Engine (PE) on the GPU under circumstances where low-power solutions are required. The application-oriented PE is fully optimized in terms of the circuit architecture and eliminates incorrect data dependency and instructional redundancy. In this paper, we propose a novel encoding approach on a sparse neural network after pruning. We partition the weight matrix into numerous blocks and use a low-rank binary map to represent the validation of these blocks. Furthermore, the elements in each nonzero block are also encoded into two submatrices: one is the binary stream discriminating the zero/nonzero position, while the other is the pure nonzero elements stored in the FIFO. In the experimental part, we implement a well pre-trained sparse neural network on the Xilinx FPGA VC707. Experimental results show that our algorithm outperforms the other benchmarks. Our approach has successfully optimized the throughput and the energy efficiency to deal with a single frame. Accordingly, we contend that Nested Bitmask Neural Network (NBNN), is an efficient neural network structure with only minor accuracy loss on the SoC system.
Keywords:
Machine Learning: Ensemble Methods
Knowledge Representation and Reasoning: Reasoning about Knowlege and Belief
Machine Learning: Feature Selection ; Learning Sparse Models
Machine Learning: Deep Learning
Computer Vision: Computer Vision