Enhancing Semantic Representations of Bilingual Word Embeddings with Syntactic Dependencies

Enhancing Semantic Representations of Bilingual Word Embeddings with Syntactic Dependencies

Linli Xu, Wenjun Ouyang, Xiaoying Ren, Yang Wang, Liang Jiang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 4517-4524. https://doi.org/10.24963/ijcai.2018/628

Cross-lingual representation is a technique that can both represent different languages in the same latent vector space and enable the knowledge transfer across languages. To learn such representations, most of existing works require parallel sentences with word-level alignments and assume that aligned words have similar Bag-of-Words (BoW) contexts.  However, due to differences in grammar structures among different languages, the contexts of aligned words in different languages may appear at different positions of the sentence. To address this issue of different syntactics across different languages, we propose a model of bilingual word embeddings integrating syntactic dependencies (DepBiWE) by producing dependency parse-trees which encode the accurate relative positions for the contexts of aligned words. In addition, a new method is proposed to learn bilingual word embeddings from dependency-based contexts and BoW contexts jointly. Extensive experimental results on a real world dataset clearly validate the superiority of the proposed model DepBiWE on various natural language processing (NLP) tasks.
Keywords:
Natural Language Processing: Natural Language Processing
Natural Language Processing: NLP Applications and Tools
Natural Language Processing: Text Classification
Natural Language Processing: Embeddings