Transfer Learning in Multi-Armed Bandits: A Causal Approach

Transfer Learning in Multi-Armed Bandits: A Causal Approach

Junzhe Zhang, Elias Bareinboim

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 1340-1346. https://doi.org/10.24963/ijcai.2017/186

Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps -- first, deriving bounds over the arm’s distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods
Keywords:
Knowledge Representation, Reasoning, and Logic: Action, Change and Causality
Uncertainty in AI: Graphical Models
Uncertainty in AI: Sequential Decision Making