Steady-State Policy Synthesis in Multichain Markov Decision Processes

George Atia; Andre Beckus; Ismail Alkhouri; Alvaro Velasquez

doi:10.24963/ijcai.2020/563

Steady-State Policy Synthesis in Multichain Markov Decision Processes

George Atia, Andre Beckus, Ismail Alkhouri, Alvaro Velasquez

Short video

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 4069-4075. https://doi.org/10.24963/ijcai.2020/563

PDF BibTeX

The formal synthesis of automated or autonomous agents has elicited strong interest from the artificial intelligence community in recent years. This problem space broadly entails the derivation of decision-making policies for agents acting in an environment such that a formal specification of behavior is satisfied. Popular formalisms for such specifications include the quintessential Linear Temporal Logic (LTL) and Computation Tree Logic (CTL) which reason over infinite sequences and trees, respectively, of states. However, the related and relevant problem of reasoning over the frequency with which states are visited infinitely and enforcing behavioral specifications on the same has received little attention. That problem, known as Steady-State Policy Synthesis (SSPS) or steady-state control, is the focus of this paper. Prior related work has been mostly confined to unichain Markov Decision Processes (MDPs), while a tractable solution to the general multichain setting heretofore remains elusive. In this paper, we provide a solution to the latter within the context of multichain MDPs over a class of policies that account for all possible transitions in the given MDP. The solution policy is derived from a novel linear program (LP) that encodes constraints on the limiting distributions of the Markov chain induced by said policy. We establish a one-to-one correspondence between the feasible solutions of the LP and the stationary distributions of the induced Markov chains. The derived policy is shown to maximize the reward among the constrained class of stationary policies and to satisfy the specification constraints even when it does not exercise all possible transitions.

Keywords:

Planning and Scheduling: Planning under Uncertainty

Planning and Scheduling: Markov Decisions Processes

Agent-based and Multi-agent Systems: Formal Verification, Validation and Synthesis