Black-Box Data Poisoning Attacks on Crowdsourcing

Black-Box Data Poisoning Attacks on Crowdsourcing

Pengpeng Chen, Yongqiang Yang, Dingqi Yang, Hailong Sun, Zhijun Chen, Peng Lin

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 2975-2983. https://doi.org/10.24963/ijcai.2023/332

Understanding the vulnerability of label aggregation against data poisoning attacks is key to ensuring data quality in crowdsourced label collection. State-of-the-art attack mechanisms generally assume full knowledge of the aggregation models while failing to consider the flexibility of malicious workers in selecting which instances to label. Such a setup limits the applicability of the attack mechanisms and impedes further improvement of their success rate. This paper introduces a black-box data poisoning attack framework that finds the optimal strategies for instance selection and labeling to attack unknown label aggregation models in crowdsourcing. We formulate the attack problem on top of a generic formalization of label aggregation models and then introduce a substitution approach that attacks a substitute aggregation model in replacement of the unknown model. Through extensive validation on multiple real-world datasets, we demonstrate the effectiveness of both instance selection and model substitution in improving the success rate of attacks.
Keywords:
Humans and AI: HAI: Human-AI collaboration
Humans and AI: HAI: Human computation and crowdsourcing
Machine Learning: ML: Robustness