From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots

From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots

Shizhe Chen, Qin Jin, Jianlong Fu

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 4932-4938. https://doi.org/10.24963/ijcai.2019/685

The neural machine translation model has suffered from the lack of large-scale parallel corpora. In contrast, we humans can learn multi-lingual translations even without parallel texts by referring our languages to the external world. To mimic such human learning behavior, we employ images as pivots to enable zero-resource translation learning. However, a picture tells a thousand words, which makes multi-lingual sentences pivoted by the same image noisy as mutual translations and thus hinders the translation model learning. In this work, we propose a progressive learning approach for image-pivoted zero-resource machine translation. Since words are less diverse when grounded in the image, we first learn word-level translation with image pivots, and then progress to learn the sentence-level translation by utilizing the learned word translation to suppress noises in image-pivoted multi-lingual sentences. Experimental results on two widely used image-pivot translation datasets, IAPR-TC12 and Multi30k, show that the proposed approach significantly outperforms other state-of-the-art methods.
Keywords:
Natural Language Processing: Machine Translation
Computer Vision: Language and Vision