Understanding PII Leakage in Large Language Models: A Systematic Survey
Understanding PII Leakage in Large Language Models: A Systematic Survey
Shuai Cheng, Zhao Li, Shu Meng, Mengxia Ren, Haitao Xu, Shuai Hao, Chuan Yue, Fan Zhang
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Survey Track. Pages 10409-10417.
https://doi.org/10.24963/ijcai.2025/1156
Large Language Models (LLMs) have demonstrated exceptional success across a variety of tasks, particularly in natural language processing, leading to their growing integration into numerous facets of daily life. However, this widespread deployment has raised substantial privacy concerns, especially regarding personally identifiable information (PII), which can be directly associated with specific individuals. The leakage of such information presents significant real-world privacy threats. In this paper, we conduct a systematic investigation into existing research on PII leakage in LLMs, encompassing commonly utilized PII datasets, evaluation metrics, and current studies on both PII leakage attacks and defensive strategies. Finally, we identify unresolved challenges in the current research landscape and suggest future research directions.
Keywords:
Multidisciplinary Topics and Applications: MTA: Security and privacy
AI Ethics, Trust, Fairness: ETF: Ethical, legal and societal issues
AI Ethics, Trust, Fairness: ETF: Safety and robustness
Natural Language Processing: NLP: Language models
