A Survey on Masked Autoencoder for Visual Self-supervised Learning

A Survey on Masked Autoencoder for Visual Self-supervised Learning

Chaoning Zhang, Chenshuang Zhang, Junha Song, John Seon Keun Yi, In So Kweon

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Survey Track. Pages 6805-6813. https://doi.org/10.24963/ijcai.2023/762

With the increasing popularity of masked autoencoders, self-supervised learning (SSL) in vision undertakes a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction have become a de facto standard SSL practice in NLP (e.g., BERT). By contrast, early attempts at generative methods in vision have been outperformed by their discriminative counterparts (like contrastive learning). However, the success of masked image modeling has revived the autoencoder-based visual pretraining method. As a milestone to bridge the gap with BERT in NLP, masked autoencoder in vision has attracted unprecedented attention. This work conducts a survey on masked autoencoders for visual SSL.
Keywords:
Survey: Computer Vision
Survey: Machine Learning