Orion: Online Backdoor Sample Detection via Evolution Deviance

Orion: Online Backdoor Sample Detection via Evolution Deviance

Huayang Huang, Qian Wang, Xueluan Gong, Tao Wang

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 864-874. https://doi.org/10.24963/ijcai.2023/96

Widely-used DNN models are vulnerable to backdoor attacks, where the backdoored model is only triggered by specific inputs but can maintain a high prediction accuracy on benign samples. Existing backdoor input detection strategies rely on the assumption that benign and poisoned samples are separable in the feature representation of the model. However, such an assumption can be broken by advanced feature-hidden backdoor attacks. In this paper, we propose a novel detection framework, dubbed Orion (online backdoor sample detection via evolution deviance). Specifically, we analyze how predictions evolve during a forward pass and find deviations between the shallow and deep outputs of the backdoor inputs. By introducing side nets to track such evolution divergence, Orion eliminates the need for the assumption of latent separability. Additionally, we put forward a scheme to restore the original label of backdoor samples, enabling more robust predictions. Extensive experiments on six attacks, three datasets, and two architectures verify the effectiveness of Orion. It is shown that Orion outperforms state-of-the-art defenses and can identify feature-hidden attacks with an F1-score of 90%, compared to 40% for other detection schemes. Orion can also achieve 80% label recovery accuracy on basic backdoor attacks.
Keywords:
Computer Vision: CV: Adversarial learning, adversarial attack and defense methods
Machine Learning: ML: Adversarial machine learning
AI Ethics, Trust, Fairness: ETF: Safety and robustness