Maximum Entropy Semi-Supervised Inverse Reinforcement Learning / 3315
Julien Audiffren, Michal Valko, Alessandro Lazaric, Mohammad Ghavamzadeh
A popular approach to apprenticeship learning (AL) is to formulate it as an inverse reinforcement learning (IRL) problem. The MaxEnt-IRL algorithm successfully integrates the maximum entropy principle into IRL and unlike its predecessors, it resolves the ambiguity arising from the fact that a possibly large number of policies could match the expert's behavior. In this paper, we study an AL setting in which in addition to the expert's trajectories,a number of unsupervised trajectories is available. We introduce MESSI,a novel algorithm that combines MaxEnt-IRL with principles coming from semisupervised learning. In particular, MESSI integrates the unsupervised data into the MaxEnt-IRL framework using a pairwise penalty on trajectories. Empirical results in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance of MaxEnt-IRL.