Unsupervised Learning of an IS-A Taxonomy from a Limited Domain-Specific Corpus / 1434
Daniele Alfarone, Jesse Davis
Taxonomies hierarchically organize concepts in a domain. Building and maintaining them by hand is a tedious and time-consuming task. This paper proposes a novel, unsupervised algorithm for automatically learning an IS-A taxonomy from scratch by analyzing a given text corpus. Our approach is designed to deal with infrequently occurring concepts, so it can effectively induce taxonomies even from small corpora. Algorithmically, the approach makes two important contributions. First, it performs inference based on clustering and the distributional semantics, which can capture links among concepts never mentioned together. Second, it uses a novel graph-based algorithm to detect and remove incorrect is-a relations from a taxonomy. An empirical evaluation on five corpora demonstrates the utility of our proposed approach.