Cardinality-Minimal Explanations for Monotonic Neural Networks

Cardinality-Minimal Explanations for Monotonic Neural Networks

Ouns El Harzli, Bernardo Cuenca Grau, Ian Horrocks

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 3677-3685. https://doi.org/10.24963/ijcai.2023/409

In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, however, known to be intractable. In this paper, we investigate whether tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in polynomial time by means of greedy algorithms if we additionally assume that the activation functions are continuous everywhere and differentiable almost everywhere. Our experiments suggest favourable performance of our algorithms.
Keywords:
Machine Learning: ML: Explainable/Interpretable machine learning
AI Ethics, Trust, Fairness: ETF: Explainability and interpretability
Machine Learning: ML: Theory of deep learning