Probably Approximately Correct Learning in Stochastic Games with Temporal Logic Specifications / 3630
Min Wen, Ufuk Topcu
We consider a controller synthesis problem in turn-based stochastic games with both a qualitative linear temporal logic (LTL) constraint and a quantitative discounted-sum objective. For each case in which the LTL specification is realizable and can be equivalently transformed into a deterministic Buchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is epsilon-optimal with respect to the discounted-sum objective for any arbitrary positive epsilon. Building on the idea of the R-MAX algorithm, we propose a probably approximately correct (PAC) learning algorithm that can learn such a strategy efficiently in an online manner with a-priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives.