Learning Paraphrase Identification with Structural Alignment / 2859
Chen Liang, Praveen Paritosh, Vinodh Rajendran, Kenneth D. Forbus
Semantic similarity of text plays an important role in many NLP tasks. It requires using both local information like lexical semantics and structural information like syntactic structures. Recent progress in word representation provides good resources for lexical semantics, and advances in natural language analysis tools make it possible to efficiently generate syntactic and semantic annotations. However, how to combine them to capture the semantics of text is still an open question. Here, we propose a new alignment-based approach to learn semantic similarity. It uses a hybrid representation, attributed relational graphs, to encode lexical, syntactic and semantic information. Alignment of two such graphs combines local and structural information to support similarity estimation. To improve alignment, we introduced structural constraints inspired by a cognitive theory of similarity and analogy. Usually only similarity labels are given in training data and the true alignments are unknown, so we address the learning problem using two approaches: alignment as feature extraction and alignment as latent variable. Our approach is evaluated on the paraphrase identification task and achieved results competitive with the state-of-the-art.