Benchmarking eXplainable AI - A Survey on Available Toolkits and Open Challenges

Benchmarking eXplainable AI - A Survey on Available Toolkits and Open Challenges

Phuong Quynh Le, Meike Nauta, Van Bach Nguyen, Shreyasi Pathak, Jörg Schlötterer, Christin Seifert

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Survey Track. Pages 6665-6673. https://doi.org/10.24963/ijcai.2023/747

The goal of Explainable AI (XAI) is to make the reasoning of a machine learning model accessible to humans, such that users of an AI system can evaluate and judge the underlying model. Due to the blackbox nature of XAI methods it is, however, hard to disentangle the contribution of a model and the explanation method to the final output. It might be unclear on whether an unexpected output is caused by the model or the explanation method. Explanation models, therefore, need to be evaluated in technical (e.g. fidelity to the model) and user-facing (correspondence to domain knowledge) terms. A recent survey has identified 29 different automated approaches to quantitatively evaluate explanations. In this work, we take an additional perspective and analyse which toolkits and data sets are available. We investigate which evaluation metrics are implemented in the toolkits and whether they produce the same results. We find that only a few aspects of explanation quality are currently covered, data sets are rare and evaluation results are not comparable across different toolkits. Our survey can serve as a guide for the XAI community for identifying future directions of research, and most notably, standardisation of evaluation.
Keywords:
Survey: Machine Learning
Survey: Humans and AI
Survey: AI Ethics, Trust, Fairness