Explaining Deep Neural Network Models with Adversarial Gradient Integration
Explaining Deep Neural Network Models with Adversarial Gradient Integration
Deng Pan, Xin Li, Dongxiao Zhu
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 2876-2883.
https://doi.org/10.24963/ijcai.2021/396
Deep neural networks (DNNs) have became one of the most high performing tools in a broad range
of machine learning areas. However, the multilayer non-linearity of the network architectures prevent
us from gaining a better understanding of the models’ predictions. Gradient based attribution
methods (e.g., Integrated Gradient (IG)) that decipher input features’ contribution to the prediction
task have been shown to be highly effective yet requiring a reference input as the anchor for explaining
model’s output. The performance of DNN model interpretation can be quite inconsistent with
regard to the choice of references. Here we propose an Adversarial Gradient Integration (AGI) method
that integrates the gradients from adversarial examples to the target example along the curve of steepest
ascent to calculate the resulting contributions from all input features. Our method doesn’t rely on
the choice of references, hence can avoid the ambiguity and inconsistency sourced from the reference
selection. We demonstrate the performance of our AGI method and compare with competing methods
in explaining image classification results. Code is available from https://github.com/pd90506/AGI.
Keywords:
Machine Learning: Adversarial Machine Learning
Machine Learning: Explainable/Interpretable Machine Learning
AI Ethics, Trust, Fairness: Explainability