Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder

Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder

Pei Yu, Liang Zhang, Biao Fu, Yidong Chen

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 5260-5268. https://doi.org/10.24963/ijcai.2023/584

Most existing studies on Sign Language Translation (SLT) employ AutoRegressive Decoding Mechanism (AR-DM) to generate target sentences. However, the main disadvantage of the AR-DM is high inference latency. To address this problem, we introduce Non-AutoRegressive Decoding Mechanism (NAR-DM) into SLT, which generates the whole sentence at once. Meanwhile, to improve its decoding ability, we integrate the advantages of curriculum learning and NAR-DM and propose a Curriculum-based NAR Decoder (CND). Specifically, the lower layers of the CND are expected to predict simple tokens that could be predicted correctly using source-side information solely. Meanwhile, the upper layers could predict complex tokens based on the lower layers' predictions. Therefore, our CND significantly reduces the model's inference latency while maintaining its competitive performance. Moreover, to further boost the performance of our CND, we propose a mutual learning framework, containing two decoders, i.e., an AR decoder and our CND. We jointly train the two decoders and minimize the KL divergence between their outputs, which enables our CND to learn the forward sequential knowledge from the strengthened AR decoder. Experimental results on PHOENIX2014T and CSL-Daily demonstrate that our model consistently outperforms all competitive baselines and achieves 7.92/8.02× speed-up compared to the AR SLT model respectively. Our source code is available at https://github.com/yp20000921/CND.
Keywords:
Natural Language Processing: NLP: Machine translation and multilinguality