Parse Tree Fragmentation of Ungrammatical Sentences / 2796
Homa B. Hashemi, Rebecca Hwa
Ungrammatical sentences present challenges for statistical parsers because the well-formed trees they produce may not be appropriate for these sentences. We introduce a framework for reviewing the parses of ungrammatical sentences and extracting the coherent parts whose syntactic analyses make sense. We call this task parse tree fragmentation. In this paper, we propose a training methodology for fragmenting parse trees without using a task-specific annotated corpus. We also propose some fragmentation strategies and compare their performance on an extrinsic task - fluency judgments in two domains: English-as-a-Second Language (ESL) and machine translation (MT). Experimental results show that the proposed fragmentation strategies are competitive with existing methods for making fluency judgments; they also suggest that the overall framework is a promising way to handle syntactically unusual sentences.