Decentralized No-regret Learning Algorithms for Extensive-form Correlated Equilibria (Extended Abstract)

Decentralized No-regret Learning Algorithms for Extensive-form Correlated Equilibria (Extended Abstract)

Andrea Celli, Alberto Marchesi, Gabriele Farina, Nicola Gatti

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Sister Conferences Best Papers. Pages 4755-4759. https://doi.org/10.24963/ijcai.2021/645

The existence of uncoupled no-regret learning dynamics converging to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and the presence of private information, correlation in extensive-form games possesses significantly different properties than in normal-form games. The extensive-form correlated equilibrium (EFCE) is the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device ({\em a.k.a.} mediator) must take into account the evolution of beliefs of each player as they make observations throughout the game. Due to this additional complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics which provably converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play after T game repetitions is guaranteed to be a O(T^-1/2)-approximate EFCE with high probability, and an EFCE almost surely in the limit.
Keywords:
Agent-based and Multi-agent Systems: Multi-agent Learning
Agent-based and Multi-agent Systems: Algorithmic Game Theory
Agent-based and Multi-agent Systems: Noncooperative Games
Machine Learning: Reinforcement Learning