Private Semi-Supervised Federated Learning

Private Semi-Supervised Federated Learning

Chenyou Fan, Junjie Hu, Jianwei Huang

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 2009-2015. https://doi.org/10.24963/ijcai.2022/279

We study a federated learning (FL) framework to effectively train models from scarce and skewly distributed labeled data. We consider a challenging yet practical scenario: a few data sources own a small amount of labeled data, while the rest mass sources own purely unlabeled data. Classical FL requires each client to have enough labeled data for local training, thus is not applicable in this scenario. In this work, we design an effective federated semi-supervised learning framework (FedSSL) to fully leverage both labeled and unlabeled data sources. We establish a unified data space across all participating agents, so that each agent can generate mixed data samples to boost semi-supervised learning (SSL), while keeping data locality. We further show that FedSSL can integrate differential privacy protection techniques to prevent labeled data leakage at the cost of minimum performance degradation. On SSL tasks with as small as 0.17% and 1% of MNIST and CIFAR-10 datasets as labeled data, respectively, our approach can achieve 5-20% performance boost over the state-of-the-art methods.
Keywords:
Data Mining: Federated Learning
Data Mining: Privacy Preserving Data Mining
Machine Learning: Semi-Supervised Learning
Machine Learning: Generative Adverserial Networks