Text Rewriting Improves Semantic Role Labeling (Extended Abstract)

Text Rewriting Improves Semantic Role Labeling (Extended Abstract)

Kristian Woodsend, Mirella Lapata

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Journal track. Pages 5095-5099. https://doi.org/10.24963/ijcai.2017/729

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.
Keywords:
Natural Language Processing: Tagging, chunking, syntax, and parsing
Natural Language Processing: Natural Language Generation
Natural Language Processing: Natural Language Semantics
Natural Language Processing: Natural Language Processing