PG3: Policy-Guided Planning for Generalized Policy Generation

Ryan Yang; Tom Silver; Aidan Curtis; Tomas Lozano-Perez; Leslie Kaelbling

doi:10.24963/ijcai.2022/650

PG3: Policy-Guided Planning for Generalized Policy Generation

Ryan Yang, Tom Silver, Aidan Curtis, Tomas Lozano-Perez, Leslie Kaelbling

Watch video

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

Main Track. Pages 4686-4692. https://doi.org/10.24963/ijcai.2022/650

PDF BibTeX

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions --- policy evaluation and plan comparison --- and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generalization (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.

Keywords:

Planning and Scheduling: Learning in Planning and Scheduling

Machine Learning: Relational Learning

Search: Heuristic Search