Quality Control Attack Schemes in Crowdsourcing

Quality Control Attack Schemes in Crowdsourcing

Alessandro Checco, Jo Bates, Gianluca Demartini

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Best Sister Conferences. Pages 6136-6140. https://doi.org/10.24963/ijcai.2019/850

An important precondition to build effective AI models is the collection of training data at scale. Crowdsourcing is a popular methodology to achieve this goal. Its adoption  introduces novel challenges in data quality control, to deal with under-performing and malicious annotators. One of the most popular quality assurance mechanisms, especially in paid micro-task crowdsourcing, is the use of a small set of pre-annotated tasks as gold standard, to assess in real time the annotators quality. In this paper, we highlight a set of vulnerabilities this scheme suffers: a group of colluding crowd workers can easily implement and deploy a decentralised machine learning inferential system to  detect and signal which parts of the task are more likely to be gold questions, making them ineffective as a quality control tool. Moreover, we demonstrate how the most common countermeasures against this attack are ineffective in practical scenarios. The basic architecture of the inferential system is composed of a browser plug-in and an external server where the colluding workers can share information. We implement and validate the attack scheme, by means of experiments on real-world data from a popular crowdsourcing platform.
Keywords:
Humans and AI: Human Computation and Crowdsourcing
Machine Learning Applications: Applications of Unsupervised Learning
Machine Learning: Online Learning
Multidisciplinary Topics and Applications: Information Retrieval