Learning Meta Word Embeddings by Unsupervised Weighted Concatenation of Source Embeddings

Learning Meta Word Embeddings by Unsupervised Weighted Concatenation of Source Embeddings

Danushka Bollegala

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 4058-4064. https://doi.org/10.24963/ijcai.2022/563

Given multiple source word embeddings learnt using diverse algorithms and lexical resources, meta word embedding learning methods attempt to learn more accurate and wide-coverage word embeddings. Prior work on meta-embedding has repeatedly discovered that simple vector concatenation of the source embeddings to be a competitive baseline. However, it remains unclear as to why and when simple vector concatenation can produce accurate meta-embeddings. We show that weighted concatenation can be seen as a spectrum matching operation between each source embedding and the meta-embedding, minimising the pairwise inner-product loss. Following this theoretical analysis, we propose two \emph{unsupervised} methods to learn the optimal concatenation weights for creating meta-embeddings from a given set of source embeddings. Experimental results on multiple benchmark datasets show that the proposed weighted concatenated meta-embedding methods outperform previously proposed meta-embedding learning methods.
Keywords:
Natural Language Processing: Embeddings
Natural Language Processing: Natural Language Semantics
Machine Learning: Representation learning
Machine Learning: Theory of Deep Learning