Survey on Efficient Training of Large Neural Networks

Julia Gusak; Daria Cherniuk; Alena Shilova; Alexandr Katrutsa; Daniel Bershatsky; Xunyi Zhao; Lionel Eyraud-Dubois; Oleh Shliazhko; Denis Dimitrov; Ivan Oseledets; Olivier Beaumont

doi:10.24963/ijcai.2022/769

Survey on Efficient Training of Large Neural Networks

Julia Gusak, Daria Cherniuk, Alena Shilova, Alexandr Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleh Shliazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

Watch video

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

Survey Track. Pages 5494-5501. https://doi.org/10.24963/ijcai.2022/769

PDF BibTeX

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models don’t fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.

Keywords:

Survey Track: -

Survey Track: Machine Learning

Survey Track: Natural Language Processing

Survey Track: Computer Vision