Learning Low-precision Neural Networks without Straight-Through Estimator (STE)

Learning Low-precision Neural Networks without Straight-Through Estimator (STE)

Zhi-Gang Liu, Matthew Mattina

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 3066-3072. https://doi.org/10.24963/ijcai.2019/425

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low precision using stochastic gradient descent (SGD). Our AB method avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient alpha and (1- alpha). During training, alpha is gradually increased from 0 to 1; the gradient updates to the weights are through the full precision term, (1-alpha) * w, of the affine combination; the model is converted from full-precision to low precision progressively. To evaluate the AB method, a 1-bit BinaryNet on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet_50 v1/2 on ImageNet are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9\%, 0.82\% and 2.93\% respectively compared to the results of STE based quantization.
Keywords:
Machine Learning: Classification
Machine Learning: Deep Learning