Sharing Residual Units Through Collective Tensor Factorization To Improve Deep Neural Networks

Sharing Residual Units Through Collective Tensor Factorization To Improve Deep Neural Networks

Yunpeng Chen, Xiaojie Jin, Bingyi Kang, Jiashi Feng, Shuicheng Yan

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 635-641. https://doi.org/10.24963/ijcai.2018/88

The residual unit and its variations are wildly used in building very deep neural networks for alleviating optimization difficulty. In this work, we revisit the standard residual function as well as its several successful variants and propose a unified framework based on tensor Block Term Decomposition (BTD) to explain these apparently different residual functions from the tensor decomposition view. With the BTD framework, we further propose a novel basic network architecture, named the Collective Residual Unit (CRU). CRU further enhances parameter efficiency of deep residual neural networks by sharing core factors derived from collective tensor factorization over the involved residual units. It enables efficient knowledge sharing across multiple residual units, reduces the number of model parameters, lowers the risk of over-fitting, and provides better generalization ability. Extensive experimental results show that our proposed CRU network brings outstanding parameter efficiency -- it achieves comparable classification performance with ResNet-200 while using a model size as small as ResNet-50 on the ImageNet-1k and Places365-Standard benchmark datasets.
Keywords:
Computer Vision: Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation
Computer Vision: 2D and 3D Computer Vision
Computer Vision: Computer Vision