Finite-Time Analysis of Heterogeneous Federated Temporal Difference Learning

Ye Zhu; Xiaowen Gong; Shiwen Mao

doi:10.24963/ijcai.2025/808

Finite-Time Analysis of Heterogeneous Federated Temporal Difference Learning

Ye Zhu, Xiaowen Gong, Shiwen Mao

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 7263-7271. https://doi.org/10.24963/ijcai.2025/808

PDF BibTeX

Federated Temporal Difference (FTD) learning has emerged as a promising framework for collaboratively evaluating policies without sharing raw data. Despite its potential, existing approaches often yield biased convergence results due to the inherent challenges of federated reinforcement learning, such as multiple local updates and environment heterogeneity. In response, we investigate federated temporal difference (TD) learning, focusing on collaborative policy evaluation with linear function approximation among agents operating in heterogeneous environments. We devise a heterogeneous federated temporal difference (HFTD) algorithm which iteratively aggregates agents' local stochastic gradients for TD learning. The HFTD algorithm involves two major novel contributions: 1) it aims to find the optimal value function model for the mixture environment which is the environment randomly drawn from agents' heterogeneous environments, using the local gradients of agents' mean squared Bellman errors (MSBEs) for their respective environments; 2) it allows agents to perform different numbers of local iterations for TD learning based on their heterogeneous computational capabilities. We analyze the finite-time convergence of the HFTD algorithm for the scenarios of IID sampling and Markovian sampling respectively. By characterizing bounds on the convergence error, we show that the HFTD algorithm can exactly converge to the optimal model and also achieves linear speedups as the number of agents increases.

Keywords:

Machine Learning: ML: Reinforcement learning

Machine Learning: ML: Federated learning