Embodied Multimodal Multitask Learning

Embodied Multimodal Multitask Learning

Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 2442-2448. https://doi.org/10.24963/ijcai.2020/338

Visually-grounded embodied language learning models have recently shown to be effective at learning multiple multimodal tasks such as following navigational instructions and answering questions. In this paper, we address two key limitations of these models, (a) the inability to transfer the grounded knowledge across different tasks and (b) the inability to transfer to new words and concepts not seen during training using only a few examples. We propose a multitask model which facilitates knowledge transfer across tasks by disentangling the knowledge of words and visual attributes in the intermediate representations. We create scenarios and datasets to quantify cross-task knowledge transfer and show that the proposed model outperforms a range of baselines in simulated 3D environments. We also show that this disentanglement of representations makes our model modular and interpretable which allows for transfer to instructions containing new concepts.
Keywords:
Machine Learning: Deep Reinforcement Learning
Machine Learning: Transfer, Adaptation, Multi-task Learning
Machine Learning Applications: Applications of Reinforcement Learning