Perturb, Predict & Paraphrase: Semi-Supervised Learning using Noisy Student for Image Captioning

Perturb, Predict & Paraphrase: Semi-Supervised Learning using Noisy Student for Image Captioning

Arjit Jain, Pranay Reddy Samala, Preethi Jyothi, Deepak Mittal, Maneesh Singh

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 758-764. https://doi.org/10.24963/ijcai.2021/105

Recent semi-supervised learning (SSL) methods are predominantly focused on multi-class classification tasks. Classification tasks allow for easy mixing of class labels during augmentation which does not trivially extend to structured outputs such as word sequences that appear in tasks like image captioning. Noisy Student Training is a recent SSL paradigm proposed for image classification that is an extension of self-training and teacher-student learning. In this work, we provide an in-depth analysis of the noisy student SSL framework for the task of image captioning and derive state-of-the-art results. The original algorithm relies on computationally expensive data augmentation steps that involve perturbing the raw images and computing features for each perturbed image. We show that, even in the absence of raw image augmentation, the use of simple model and feature perturbations to the input images for the student model are beneficial to SSL training. We also show how a paraphrase generator could be effectively used for label augmentation to improve the quality of pseudo labels and significantly improve performance. Our final results in the limited labeled data setting (1% of the MS-COCO labeled data) outperform previous state-of-the-art approaches by 2.5 on BLEU4 and 11.5 on CIDEr scores.
Keywords:
Computer Vision: Language and Vision
Machine Learning: Semi-Supervised Learning