Run-Time Improvement of Point-Based POMDP Policies / 2408
Minlue Wang, Richard Dearden
The most successful recent approaches to partially observable Markov decision problem (POMDP) solving have largely been point-based approximation algorithms. These work by selecting a finite number of belief points, computing alpha-vectors for those points, and using the resulting policy everywhere. However, if during execution the belief state is far from the points, there is no guarantee that the policy will be good. This case occurs either when the points are chosen poorly or there are too few points to capture the whole optimal policy, for example in domains where there are many low probability transitions, such as faults or exogenous events. In this paper we explore the use of an on-line plan repair approach to overcome this difficulty. The idea is to split computation between off-line plan creation and, if necessary, on-line plan repair. We evaluate a variety of heuristics used to determine when plan repair might be useful, and then repair the plan by sampling a small number of additional belief points and recomputing the policy. We show in several domains that the approach is more effective than either off-line planning alone even with much more computation time, or a purely on-line planning based on forward search. We also show that the overhead of checking the heuristics is very small when replanning is unnecessary.