A Primal-dual Perspective for Distributed TD-learning

Han Dong Lim; Donghwan Lee

doi:10.24963/ijcai.2025/634

A Primal-dual Perspective for Distributed TD-learning

Han Dong Lim, Donghwan Lee

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 5698-5706. https://doi.org/10.24963/ijcai.2025/634

PDF BibTeX

The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.

Keywords:

Machine Learning: ML: Reinforcement learning

Agent-based and Multi-agent Systems: MAS: Multi-agent learning