Truly Batch Apprenticeship Learning with Deep Successor Features

Truly Batch Apprenticeship Learning with Deep Successor Features

Donghun Lee, Srivatsan Srinivasan, Finale Doshi-Velez

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
AI for Improving Human Well-being. Pages 5909-5915. https://doi.org/10.24963/ijcai.2019/819

We introduce a novel apprenticeship learning algorithm to learn an expert's underlying reward structure in off-policy model-free batch settings. Unlike existing methods that require hand-crafted features, on-policy evaluation, further data acquisition for evaluation policies or the knowledge of model dynamics, our algorithm requires only batch data (demonstrations) of the observed expert behavior.  Such settings are common in many real-world tasks---health care, finance, or industrial process control---where accurate simulators do not exist and additional data acquisition is costly.  We develop a transition-regularized imitation learning model to learn a rich feature representation and a near-expert initial policy that makes the subsequent batch inverse reinforcement learning process viable. We also introduce deep successor feature networks that perform off-policy evaluation to estimate feature expectations of candidate policies. Under the batch setting, our method achieves superior results on control benchmarks as well as a real clinical task of sepsis management in the Intensive Care Unit.
Keywords:
Special Track on AI for Improving Human-Well Being: Health applications (Special Track on AI and Human Wellbeing)
Special Track on AI for Improving Human-Well Being: AI safety (Special Track on AI and Human Wellbeing)