Heterogeneous Federated Learning with Scalable Server Mixture-of-Experts

Jingang Jiang; Yanzhao Chen; Xiangyang Liu; Haiqi Jiang; Chenyou Fan

doi:10.24963/ijcai.2025/610

Heterogeneous Federated Learning with Scalable Server Mixture-of-Experts

Jingang Jiang, Yanzhao Chen, Xiangyang Liu, Haiqi Jiang, Chenyou Fan

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 5480-5488. https://doi.org/10.24963/ijcai.2025/610

PDF BibTeX

Classical Federated Learning (FL) encounters significant challenges when deploying large models on power-constrained clients. To tackle this, we propose an asymmetric FL mechanism that enables the aggregation of compact client models into a comprehensive server model. We design the server model as a Mixture-of-Experts (MoE), where each expert has the same architecture as each client model. This uniformity allows for efficient fusion of the most pertinent client models to update each server expert, based on the measured relevance between each client and server expert. To address the Non-IID data issue, we further optimize the server-side MoE architecture by incorporating a main expert that always activates alongside a set of selectively activated routed experts. This configuration ensures a balance between learning general knowledge and specific data distribution. Our Fed-MoE framework is model-agnostic and has demonstrated notable improvements on vision FL tasks with million-scale ResNet backbones, and language tasks with billion-scale BERT and GPT-2 backbones.

Keywords:

Machine Learning: ML: Federated learning

Machine Learning: ML: Deep learning architectures

Natural Language Processing: NLP: Language models

Natural Language Processing: NLP: Text classification