Abstract

Topological Value Iteration Algorithm for Markov Decision Processes

Topological Value Iteration Algorithm for Markov Decision Processes

Peng Dai, Judy Goldsmith

Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-the-art ones. All of these use reachability analysis and heuristics to avoid some unnecessary backups. However, none of these approaches fully exploit the graphical features of the MDPs or use these features to yield the best backup sequence of the state space. We introduce an algorithm named Topological Value Iteration (TVI) that can circumvent the problem of unnecessary backups by detecting the structure of MDPs and backing up states based on topological sequences. We prove that the backup sequenceValue Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-the-art ones. All of these use reachability analysis and heuristics to avoid some unnecessary backups. However, none of these approaches fully exploit the graphical features of the MDPs or use these features to yield the best backup sequence of the state space. We introduce an algorithm named Topological Value Iteration (TVI) that can circumvent the problem of unnecessary backups by detecting the structure of MDPs and backing up states based on topological sequences. We prove that the backup sequence TVI applies is optimal. Our experimental results show that TVI outperforms VI, LAO*, LRTDP and HDP on our benchmark MDPs. TVI applies is optimal. Our experimental results show that TVI outperforms VI, LAO*, LRTDP and HDP on our benchmark MDPs.