AttExplainer: Explain Transformer via Attention by Reinforcement Learning

AttExplainer: Explain Transformer via Attention by Reinforcement Learning

Runliang Niu, Zhepei Wei, Yan Wang, Qi Wang

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 724-731. https://doi.org/10.24963/ijcai.2022/102

Transformer and its variants, built based on attention mechanisms, have recently achieved remarkable performance in many NLP tasks. Most existing works on Transformer explanation tend to reveal and utilize the attention matrix with human subjective intuitions in a qualitative manner. However, the huge size of dimensions directly challenges these methods to quantitatively analyze the attention matrix. Therefore, in this paper, we propose a novel reinforcement learning (RL) based framework for Transformer explanation via attention matrix, namely AttExplainer. The RL agent learns to perform step-by-step masking operations by observing the change in attention matrices. We have adapted our method to two scenarios, perturbation-based model explanation and text adversarial attack. Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines. Additional studies show that our method is highly transferable and consistent with human intuition. The code of this paper is available at https://github.com/niuzaisheng/AttExplainer .
Keywords:
AI Ethics, Trust, Fairness: Explainability and Interpretability
Natural Language Processing: Interpretability and Analysis of Models for NLP
Natural Language Processing: Text Classification