Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation

Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation

Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 5122-5130. https://doi.org/10.24963/ijcai.2023/569

Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.
Keywords:
Natural Language Processing: NLP: Interpretability and analysis of models for NLP
Natural Language Processing: NLP: Text classification