ALaSca: an Automated approach for Large-Scale Lexical Substitution

ALaSca: an Automated approach for Large-Scale Lexical Substitution

Caterina Lacerra, Tommaso Pasini, Rocco Tripodi, Roberto Navigli

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 3836-3842. https://doi.org/10.24963/ijcai.2021/528

The lexical substitution task aims at finding suitable replacements for words in context. It has proved to be useful in several areas, such as word sense induction and text simplification, as well as in more practical applications such as writing-assistant tools. However, the paucity of annotated data has forced researchers to apply mainly unsupervised approaches, limiting the applicability of large pre-trained models and thus hampering the potential benefits of supervised approaches to the task. In this paper, we mitigate this issue by proposing ALaSca, a novel approach to automatically creating large-scale datasets for English lexical substitution. ALaSca allows examples to be produced for potentially any word in a language vocabulary and to cover most of the meanings it lists. Thanks to this, we can unleash the full potential of neural architectures and finetune them on the lexical substitution task. Indeed, when using our data, a transformer-based model performs substantially better than when using manually annotated data only. We release ALaSca at https://sapienzanlp.github.io/alasca/.
Keywords:
Natural Language Processing: Natural Language Semantics
Natural Language Processing: Resources and Evaluation