Heterogeneous Federated Learning with Scalable Server Mixture-of-Experts
Heterogeneous Federated Learning with Scalable Server Mixture-of-Experts
Jingang Jiang, Yanzhao Chen, Xiangyang Liu, Haiqi Jiang, Chenyou Fan
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 5480-5488.
https://doi.org/10.24963/ijcai.2025/610
Classical Federated Learning (FL) encounters significant challenges when deploying large models on power-constrained clients. To tackle this, we propose an asymmetric FL mechanism that enables the aggregation of compact client models into a comprehensive server model. We design the server model as a Mixture-of-Experts (MoE), where each expert has the same architecture as each client model. This uniformity allows for efficient fusion of the most pertinent client models to update each server expert, based on the measured relevance between each client and server expert. To address the Non-IID data issue, we further optimize the server-side MoE architecture by incorporating a main expert that always activates alongside a set of selectively activated routed experts. This configuration ensures a balance between learning general knowledge and specific data distribution. Our Fed-MoE framework is model-agnostic and has demonstrated notable improvements on vision FL tasks with million-scale ResNet backbones, and language tasks with billion-scale BERT and GPT-2 backbones.
Keywords:
Machine Learning: ML: Federated learning
Machine Learning: ML: Deep learning architectures
Natural Language Processing: NLP: Language models
Natural Language Processing: NLP: Text classification
