Robust Interpretable Text Classification against Spurious Correlations Using AND-rules with Negation

Robust Interpretable Text Classification against Spurious Correlations Using AND-rules with Negation

Rohan Kumar Yadav, Lei Jiao, Ole-Christoffer Granmo, Morten Goodwin

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 4439-4446. https://doi.org/10.24963/ijcai.2022/616

The state-of-the-art natural language processing models have raised the bar for excellent performance on a variety of tasks in recent years. However, concerns are rising over their primitive sensitivity to distribution biases that reside in the training and testing data. This issue hugely impacts the performance of the models when exposed to out-of-distribution and counterfactual data. The root cause seems to be that many machine learning models are prone to learn the shortcuts, modelling simple correlations rather than more fundamental and general relationships. As a result, such text classifiers tend to perform poorly when a human makes minor modifications to the data, which raises questions regarding their robustness. In this paper, we employ a rule-based architecture called Tsetlin Machine (TM) that learns both simple and complex correlations by ANDing features and their negations. As such, it generates explainable AND-rules using negated and non-negated reasoning. Here, we explore how non-negated reasoning can be more prone to distribution biases than negated reasoning. We further leverage this finding by adapting the TM architecture to mainly perform negated reasoning using the specificity parameter s. As a result, the AND-rules becomes robust to spurious correlations and can also correctly predict counterfactual data. Our empirical investigation of the model's robustness uses the specificity s to control the degree of negated reasoning. Experiments on publicly available Counterfactually-Augmented Data demonstrate that the negated clauses are robust to spurious correlations and outperform Naive Bayes, SVM, and Bi-LSTM by up to 20 %, and ELMo by almost 6 % on counterfactual test data.
Keywords:
Natural Language Processing: Text Classification
Machine Learning: Robustness
Natural Language Processing: Applications
Natural Language Processing: Interpretability and Analysis of Models for NLP