MultiQuant: Training Once for Multi-bit Quantization of Neural Networks

MultiQuant: Training Once for Multi-bit Quantization of Neural Networks

Ke Xu, Qiantai Feng, Xingyi Zhang, Dong Wang

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 3629-3635. https://doi.org/10.24963/ijcai.2022/504

Quantization has become a popular technique to compress deep neural networks (DNNs) and reduce computational costs, but most prior work focuses on training DNNs at each individual fixed bit-width and accuracy trade-off point. How to produce a model with flexible precision is largely unexplored. This work proposes a multi-bit quantization framework (MultiQuant) to make the learned DNNs robust for different precision configuration during inference by adopting Lowest-Random-Highest bit-width co-training method. Meanwhile, we propose an online adaptive label generation strategy to alleviate the problem of vicious competition under different precision caused by one-hot labels in the supernet training. The trained supernet model can be flexibly set to different bit widths to support dynamic speed and accuracy trade-off. Furthermore, we adopt the Monte Carlo sampling-based genetic algorithm search strategy with quantization-aware accuracy predictor as evaluation criterion to incorporate the mixed precision technology in our framework. Experiment results on ImageNet datasets demonstrate MultiQuant method can attain the quantization results under different bit-widths comparable with quantization-aware training without retraining.
Keywords:
Machine Learning: Automated Machine Learning
Machine Learning: Convolutional Networks
Machine Learning: Robustness
Machine Learning: Classification