Database-Text Alignment via Structured Multilabel Classification

Benjamin Snyder, Regina Barzilay

This paper addresses the task of aligning a database with a corresponding text. The goal is to link individual database entries with sentences that verbalize the same information. By providing explicit semantics-to-text links, these alignments can aid the training of natural language generation and information extraction systems. Beyond these pragmatic benefits, the alignment problem is appealing from a modeling perspective: the mappings between database entries and text sentences exhibit rich structural dependencies, unique to this task. Thus, the key challenge is to make use of as many global dependencies as possible without sacrificing tractability. To this end, we cast text-database alignment as a structured multilabel classification task where each sentence is labeled with a subset of matching database entries. In contrast to existing multilabel classifiers, our approach operates over arbitrary global features of inputs and proposed labels. We compare our model with a baseline classifier that makes locally optimal decisions. Our results show that the proposed model yields a 15% relative reduction in error, and compares favorably with human performance.