Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating

Hongfei Xu; Deyi Xiong; Josef van Genabith; Qiuhui Liu

doi:10.24963/ijcai.2020/544

Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating

Hongfei Xu, Deyi Xiong, Josef van Genabith, Qiuhui Liu

Short video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 3933-3940. https://doi.org/10.24963/ijcai.2020/544

PDF BibTeX

Existing Neural Machine Translation (NMT) systems are generally trained on a large amount of sentence-level parallel data, and during prediction sentences are independently translated, ignoring cross-sentence contextual information. This leads to inconsistency between translated sentences. In order to address this issue, context-aware models have been proposed. However, document-level parallel data constitutes only a small part of the parallel data available, and many approaches build context-aware models based on a pre-trained frozen sentence-level translation model in a two-step training manner. The computational cost of these approaches is usually high. In this paper, we propose to make the most of layers pre-trained on sentence-level data in contextual representation learning, reusing representations from the sentence-level Transformer and significantly reducing the cost of incorporating contexts in translation. We find that representations from shallow layers of a pre-trained sentence-level encoder play a vital role in source context encoding, and propose to perform source context encoding upon weighted combinations of pre-trained encoder layers' outputs. Instead of separately performing source context and input encoding, we propose to iteratively and jointly encode the source input and its contexts and to generate input-aware context representations with a cross-attention layer and a gating mechanism, which resets irrelevant information in context encoding. Our context-aware Transformer model outperforms the recent CADec [Voita et al., 2019c] on the English-Russian subtitle data and is about twice as fast in training and decoding.

Keywords:

Natural Language Processing: Machine Translation