Evaluating Natural Language Generation via Unbalanced Optimal Transport
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 3730-3736. https://doi.org/10.24963/ijcai.2020/516
Embedding-based evaluation measures have shown promising improvements on the correlation with human judgments in natural language generation. In these measures, various intrinsic metrics are used in the computation, including generalized precision, recall, F-score and the earth mover's distance. However, the relations between these metrics are unclear, making it difficult to determine which measure to use in real applications. In this paper, we provide an in-depth study on the relations between these metrics. Inspired by the optimal transportation theory, we prove that these metrics correspond to the optimal transport problem with different hard marginal constraints. However, these hard marginal constraints may cause the problem of incomplete and noisy matching in the evaluation process. Therefore we propose a family of new evaluation metrics, namely Lazy Earth Mover's Distances, based on the more general unbalanced optimal transport problem. Experimental results on WMT18 and WMT19 show that our proposed metrics have the ability to produce more consistent evaluation results with human judgements, as compared with existing intrinsic metrics.
Natural Language Processing: Natural Language Generation
Natural Language Processing: Machine Translation
Natural Language Processing: Dialogue