Sanitizing Backdoored Graph Neural Networks: A Multidimensional Approach

Rong Zhao; Jilian Zhang; Yu Wang; Yinyan Zhang; Jian Weng

doi:10.24963/ijcai.2025/884

Sanitizing Backdoored Graph Neural Networks: A Multidimensional Approach

Rong Zhao, Jilian Zhang, Yu Wang, Yinyan Zhang, Jian Weng

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 7949-7957. https://doi.org/10.24963/ijcai.2025/884

PDF BibTeX

Graph Neural Networks (GNNs) are known to be prone to adversarial attacks, among which backdoor attack is a major security threat. By injecting backdoor triggers into a graph and assigning a target class label to nodes attached to the triggers, the attacker can mislead the GNN model trained on the poisoned graph to classify test nodes attached with a trigger to the target class. To defend against backdoor attacks, existing defense methods rely on anomaly detection in feature distribution or label transformation. However, these approaches are incapable of detecting in-distribution triggers or clean-label attacks that do not alter the class label of target nodes. To tackle these threats, we empirically analyze triggers from a multidimensional aspect, and our analysis shows that there are clear distinctions between trigger nodes and normal ones in terms of node feature values, node embeddings, and class prediction probabilities. Based on these findings, we propose a Multidimensional Anomaly Detection framework (MAD) that can effectively minimize the impact of triggers by pruning away anomalous nodes and edges. Extensive experiments show that at the cost of slight loss in clean classification accuracy, MAD achieves considerably lower attack success rate as compared to state-of-the-art backdoor defense methods.

Keywords:

Multidisciplinary Topics and Applications: MTA: Security and privacy

AI Ethics, Trust, Fairness: ETF: Safety and robustness

Machine Learning: ML: Adversarial machine learning