Efficiently Exploiting Symmetries in Real Time Dynamic Programming

Shravan M. Narayanamurthy, Balaraman Ravindran.

Current approaches to solving Markov Decision Processes (MDPs) are sensitive to the size of the MDP. When applied to real world problems though, MDPs exhibit considerable implicit redundancy, especially in the form of symmetries. Existing model minimization methods do not exploit this redundancy due to symmetries well. In this work, given such symmetries, we present a time-efficient algorithm to construct a functionally equivalent reduced model of the MDP. Further, we present a Real Time Dynamic Programming (RTDP) algorithm which obviates an explicit construction of the reduced model by integrating the given symmetries into it. The RTDP algorithm solves the reduced model, while working with parameters of the original model and the given symmetries. As RTDP uses its experience to determine which states to backup, it focuses on parts of the reduced state set that are most relevant. This results in significantly faster learning and a reduced overall execution time. The algorithms proposed are particularly effective in the case of structured automorphisms even when the reduced model does not have fewer features. We demonstrate the results empirically on several domains.