The Dependence of Effective Planning Horizon on Model Accuracy / 4180
Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis
Because planning with a long horizon (i.e., looking far into the future) is computationally expensive, it is common in practice to save time by using reduced horizons. This is usually understood to come at the expense of computing suboptimal plans, which is the case when the planning model is exact. However, when the planning model is estimated from data, as is frequently true in the real world, the policy found using a shorter planning horizon can actually be better than a policy learned with the true horizon. In this paper we provide a precise explanation for this phenomenon based on principles of learning theory. We show formally that the planning horizon is a complexity control parameter for the class of policies available to the planning algorithm, having an intuitive, monotonic relationship with a simple measure of complexity. We prove a planning loss bound predicting that shorter planning horizons can reduce overfitting and improve test performance, and we confirm these predictions empirically.