Exploiting Multi-Modal Interactions: A Unified Framework

Given an imagebase with tagged images, four types of tasks an be executed, i.e., content-based image retrieval, image annotation, text-based image retrieval, and query expansion. For any of these tasks the similarity on the concerned type of objects is essential. In this paper, we propose a framework to tackle these four tasks from a unified view. The essence of the framework is to estimate similarities by exploiting the interactions between objects of different modality. Experiments show that the proposed method can improve similarity estimation, and based on the improved similarity estimation, some simple methods can achieve better performances than some state-of-the-art techniques.

Ming Li, Xiao-Bing Xue, Zhi-Hua Zhou