Where Does This Data Come From? Enhanced Source Inference Attacks in Federated Learning

Where Does This Data Come From? Enhanced Source Inference Attacks in Federated Learning

Haiyang Chen, Xiaolong Xu, Xiang Zhu, Xiaokang Zhou, Fei Dai, Yansong Gao, Xiao Chen, Shuo Wang, Hongsheng Hu

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 4815-4823. https://doi.org/10.24963/ijcai.2025/536

Federated learning (FL) enables collaborative model training without exposing raw data, offering a privacy-aware alternative to centralized learning. However, FL remains vulnerable to various privacy attacks that exploit shared model updates, including membership inference, property inference, and gradient inversion. Source inference attacks further threaten FL by identifying which client contributed a specific training sample, posing severe risks to user and institutional privacy. Existing source inference attacks mainly assume passive adversaries and overlook more realistic scenarios where the server actively manipulates the training process. In this paper, we present an enhanced source inference attack that demonstrates how a malicious server can amplify behavioral differences between clients to more accurately infer data origin. Our approach introduces active training manipulation and data augmentation to expose client-specific patterns. Experimental results across five representative FL algorithms and multiple datasets show that our method significantly outperforms prior passive attacks. These findings reveal a deeper level of privacy vulnerability in FL and call for stronger defense mechanisms under active threat models.
Keywords:
Machine Learning: ML: Federated learning
AI Ethics, Trust, Fairness: ETF: Trustworthy AI
Machine Learning: ML: Trustworthy machine learning
Multidisciplinary Topics and Applications: MTA: Security and privacy