Bayesian Experience Reuse for Learning from Multiple Demonstrators

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Mike Gimelfarb, Scott Sanner, Chi-Guhn Lee

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 2425-2431. https://doi.org/10.24963/ijcai.2021/334

Learning from Demonstrations (LfD) is a powerful approach for incorporating advice from experts in the form of demonstrations. However, demonstrations often come from multiple sub-optimal experts with conflicting goals, rendering them difficult to incorporate effectively in online settings. To address this, we formulate a quadratic program whose solution yields an adaptive weighting over experts, that can be used to sample experts with relevant goals. In order to compare different source and target task goals safely, we model their uncertainty using normal-inverse-gamma priors, whose posteriors are learned from demonstrations using Bayesian neural networks with a shared encoder. Our resulting approach, which we call Bayesian Experience Reuse, can be applied for LfD in static and dynamic decision-making settings. We demonstrate its effectiveness for minimizing multi-modal functions, and optimizing a high-dimensional supply chain with cost uncertainty, where it is also shown to improve upon the performance of the demonstrators' policies.
Keywords:
Machine Learning: Deep Reinforcement Learning
Machine Learning: Transfer, Adaptation, Multi-task Learning
Uncertainty in AI: Approximate Probabilistic Inference
Uncertainty in AI: Bayesian Networks