Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations

Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations

Yunke Wang, Chang Xu, Bo Du

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 3155-3161. https://doi.org/10.24963/ijcai.2021/434

The agent in imitation learning (IL) is expected to mimic the behavior of the expert. Its performance relies highly on the quality of given expert demonstrations. However, the assumption that collected demonstrations are optimal cannot always hold in real-world tasks, which would seriously influence the performance of the learned agent. In this paper, we propose a robust method within the framework of Generative Adversarial Imitation Learning (GAIL) to address imperfect demonstration issue, in which good demonstrations can be adaptively selected for training while bad demonstrations are abandoned. Specifically, a binary weight is assigned to each expert demonstration to indicate whether to select it for training. The reward function in GAIL is employed to determine this weight (i.e. higher reward results in higher weight). Compared to some existing solutions that require some auxiliary information about this weight, we set up the connection between weight and model so that we can jointly optimize GAIL and learn the latent weight. Besides hard binary weighting, we also propose a soft weighting scheme. Experiments in the Mujoco demonstrate the proposed method outperforms other GAIL-based methods when dealing with imperfect demonstrations.
Keywords:
Machine Learning: Adversarial Machine Learning
Machine Learning: Unsupervised Learning