Reduction Techniques for Model Checking and Learning in MDPs

Suda Bharadwaj; Stephane Le Roux; Guillermo Perez; Ufuk Topcu

Reduction Techniques for Model Checking and Learning in MDPs

Suda Bharadwaj, Stephane Le Roux, Guillermo Perez, Ufuk Topcu

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

Main track. Pages 4273-4279. https://doi.org/10.24963/ijcai.2017/597

PDF BibTeX

Omega-regular objectives in Markov decision processes (MDPs) reduce to reachability: find a policy which maximizes the probability of reaching a target set of states. Given an MDP, an initial distribution, and a target set of states, such a policy can be computed by most probabilistic model checking tools. If the MDP is only partially specified, i.e., some prob- abilities are unknown, then model-learning techniques can be used to statistically approximate the probabilities and enable the computation of the de- sired policy. For fully specified MDPs, reducing the size of the MDP translates into faster model checking; for partially specified MDPs, into faster learning. We provide reduction techniques that al- low us to remove irrelevant transition probabilities: transition probabilities (known, or to be learned) that do not influence the maximal reachability probability. Among other applications, these reductions can be seen as a pre-processing of MDPs before model checking or as a way to reduce the number of experiments required to obtain a good approximation of an unknown MDP.

Keywords:

Planning and Scheduling: Markov Decisions Processes

Uncertainty in AI: Sequential Decision Making

Multidisciplinary Topics and Applications: Validation and Verification

Uncertainty in AI: Uncertainty in AI