mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers

mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers

Xavier Gitiaux, Huzefa Rangwala

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
AI for Improving Human Well-being. Pages 5871-5879. https://doi.org/10.24963/ijcai.2019/814

Machine learning algorithms are increasingly involved in sensitive decision-making processes with adversarial implications on individuals. This paper presents a new tool,  mdfa that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness.  Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier widely used in the United States and demonstrate that for individuals with little criminal history, identified African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans.
Keywords:
Special Track on AI for Improving Human-Well Being: AI ethics (Special Track on AI and Human Wellbeing)
Special Track on AI for Improving Human-Well Being: Societal applications (Special Track on AI and Human Wellbeing)
Special Track on AI for Improving Human-Well Being: Well being metrics (Special Track on AI and Human Wellbeing)