Imprecise Oracles Impose Limits to Predictability in Supervised Learning (Extended Abstract)

Imprecise Oracles Impose Limits to Predictability in Supervised Learning (Extended Abstract)

Anjali Sifar, Nisheeth Srivastava

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Sister Conferences Best Papers. Pages 4834-4838. https://doi.org/10.24963/ijcai.2021/661

Supervised learning operates on the premise that labels unambiguously represent ground truth. This premise is reasonable in domains wherein a high degree of consensus is easily possible for any given data record, e.g. in agreeing on whether an image contains an elephant or not. However, there are several domains wherein people disagree with each other on the appropriate label to assign to a record, e.g. whether a tweet is toxic. We argue that data labeling must be understood as a process with some degree of domain-dependent noise and that any claims of predictive prowess must be sensitive to the degree of this noise. We present a method for quantifying labeling noise in a particular domain wherein people are seen to disagree with their own past selves on the appropriate label to assign to a record: choices under prospect uncertainty. Our results indicate that `state-of-the-art' choice models of decisions from description, by failing to consider the intrinsic variability of human choice behavior, find themselves in the odd position of predicting humans' choices better than the same humans' own previous choices for the same problem. We conclude with observations on how the predicament we empirically demonstrate in our work could be handled in the practice of supervised learning.
Keywords:
Uncertainty in AI: Uncertainty Representations
Humans and AI: Cognitive Systems
Machine Learning: Classification
Machine Learning Applications: Applications of Supervised Learning