Sanitizing Backdoored Graph Neural Networks: A Multidimensional Approach
Sanitizing Backdoored Graph Neural Networks: A Multidimensional Approach
Rong Zhao, Jilian Zhang, Yu Wang, Yinyan Zhang, Jian Weng
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 7949-7957.
https://doi.org/10.24963/ijcai.2025/884
Graph Neural Networks (GNNs) are known to be prone to adversarial attacks, among which backdoor attack is a major security threat. By injecting backdoor triggers into a graph and assigning a target class label to nodes attached to the triggers, the attacker can mislead the GNN model trained on the poisoned graph to classify test nodes attached with a trigger to the target class. To defend against backdoor attacks, existing defense methods rely on anomaly detection in feature distribution or label transformation. However, these approaches are incapable of detecting in-distribution triggers or clean-label attacks that do not alter the class label of target nodes. To tackle these threats, we empirically analyze triggers from a multidimensional aspect, and our analysis shows that there are clear distinctions between trigger nodes and normal ones in terms of node feature values, node embeddings, and class prediction probabilities. Based on these findings, we propose a Multidimensional Anomaly Detection framework (MAD) that can effectively minimize the impact of triggers by pruning away anomalous nodes and edges. Extensive experiments show that at the cost of slight loss in clean classification accuracy, MAD achieves considerably lower attack success rate as compared to state-of-the-art backdoor defense methods.
Keywords:
Multidisciplinary Topics and Applications: MTA: Security and privacy
AI Ethics, Trust, Fairness: ETF: Safety and robustness
Machine Learning: ML: Adversarial machine learning
