KCNN: Kernel-wise Quantization to Remarkably Decrease Multiplications in Convolutional Neural Network

KCNN: Kernel-wise Quantization to Remarkably Decrease Multiplications in Convolutional Neural Network

Linghua Zeng, Zhangcheng Wang, Xinmei Tian

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 4234-4242. https://doi.org/10.24963/ijcai.2019/588

Convolutional neural networks (CNNs) have demonstrated state-of-the-art performance in computer vision tasks. However, the high computational power demand of running devices of recent CNNs has hampered many of their applications. Recently, many methods have quantized the floating-point weights and activations to fixed-points or binary values to convert fractional arithmetic to integer or bit-wise arithmetic. However, since the distributions of values in CNNs are extremely complex, fixed-points or binary values lead to numerical information loss and cause performance degradation. On the other hand, convolution is composed of multiplications and accumulation, but the implementation of multiplications in hardware is more costly comparing with accumulation. We can preserve the rich information of floating-point values on dedicated low power devices by considerably decreasing the multiplications. In this paper, we quantize the floating-point weights in each kernel separately to multiple bit planes to remarkably decrease multiplications. We obtain a closed-form solution via an aggressive Lloyd algorithm and the fine-tuning is adopted to optimize the bit planes. Furthermore, we propose dual normalization to solve the pathological curvature problem during fine-tuning. Our quantized networks show negligible performance loss compared to their floating-point counterparts.
Keywords:
Machine Learning: Classification
Machine Learning: Deep Learning