Reduction Techniques for Model Checking and Learning in MDPs

Reduction Techniques for Model Checking and Learning in MDPs

Suda Bharadwaj, Stephane Le Roux, Guillermo Perez, Ufuk Topcu

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 4273-4279. https://doi.org/10.24963/ijcai.2017/597

Omega-regular objectives in Markov decision processes (MDPs) reduce to reachability: find a policy which maximizes the probability of reaching a target set of states. Given an MDP, an initial distribution, and a target set of states, such a policy can be computed by most probabilistic model checking tools. If the MDP is only partially specified, i.e., some prob- abilities are unknown, then model-learning techniques can be used to statistically approximate the probabilities and enable the computation of the de- sired policy. For fully specified MDPs, reducing the size of the MDP translates into faster model checking; for partially specified MDPs, into faster learning. We provide reduction techniques that al- low us to remove irrelevant transition probabilities: transition probabilities (known, or to be learned) that do not influence the maximal reachability probability. Among other applications, these reductions can be seen as a pre-processing of MDPs before model checking or as a way to reduce the number of experiments required to obtain a good approximation of an unknown MDP.
Keywords:
Planning and Scheduling: Markov Decisions Processes
Uncertainty in AI: Sequential Decision Making
Multidisciplinary Topics and Applications: Validation and Verification
Uncertainty in AI: Uncertainty in AI