Learning Predictive State Representations via Monte-Carlo Tree Search / 3192
Yunlong Liu, Hexing Zhu, Yifeng Zeng, Zongxiong Dai
Predictive State Representations (PSRs) are an efficient method for modelling partially observable dynamical systems. They have shown advantages over the latent state-based approaches by using functions of a set of observable quantities, called tests, to represent model states. As a consequence, discovering the set of tests for representing the state is one of the central problems in PSRs. Existing techniques either discover these tests through iterative methods, which can only be applied to some toy problems, or avoid the complex discovery problem by maintaining a very large set of tests, which may be prohibitively expensive. In this paper, with the benefits of Monte-Carlo tree search (MCTS) for finding solutions in complex problems, and by proposing the concept of model entropy for measuring the model accuracy as the evaluation function in MCTS, the discovery problem is formalized as a sequential decision making problem. Then such a decision making problem can be solved using MCTS, a set of tests for representing state can be obtained and the PSR model of the underlying system can be learned straightforwardly. We conduct experiments on several domains including one extremely large domain and the experimental results show the effectiveness of our approach.