Efficient Label Contamination Attacks Against Black-Box Learning Models

Mengchen Zhao; Bo An; Wei Gao; Teng Zhang

Efficient Label Contamination Attacks Against Black-Box Learning Models

Mengchen Zhao, Bo An, Wei Gao, Teng Zhang

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

Main track. Pages 3945-3951. https://doi.org/10.24963/ijcai.2017/551

PDF BibTeX

Label contamination attack (LCA) is an important type of data poisoning attack where an attacker manipulates the labels of training data to make the learned model beneficial to him. Existing work on LCA assumes that the attacker has full knowledge of the victim learning model, whereas the victim model is usually a black-box to the attacker. In this paper, we develop a Projected Gradient Ascent (PGA) algorithm to compute LCAs on a family of empirical risk minimizations and show that an attack on one victim model can also be effective on other victim models. This makes it possible that the attacker designs an attack against a substitute model and transfers it to a black-box victim model. Based on the observation of the transferability, we develop a defense algorithm to identify the data points that are most likely to be attacked. Empirical studies show that PGA significantly outperforms existing baselines and linear learning models are better substitute models than nonlinear ones.

Keywords:

Multidisciplinary Topics and Applications: AI&Security and Privacy

Multidisciplinary Topics and Applications: Multidisciplinary Topics and Applications