Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks / 3846
Yuxin Peng, Xin Huang, Jinwei Qi
Inspired by the progress of deep neural network (DNN) in single-media retrieval, the researchers have applied the DNN to cross-media retrieval. These methods are mainly two-stage learning: the first stage is to generate the separate representation for each media type, and the existing methods only model the intra-media information but ignore the inter-media correlation with the rich complementary context to the intra-media information. The second stage is to get the shared representation by learning the cross-media correlation, and the existing methods learn the shared representation through a shallow network structure, which cannot fully capture the complex cross-media correlation. For addressing the above problems, we propose the cross-media multiple deep network (CMDN) to exploit the complex cross-media correlation by hierarchical learning. In the first stage, CMDN jointly models the intra-media and inter-media information for getting the complementary separate representation of each media type. In the second stage, CMDN hierarchically combines the inter-media and intra-media representations to further learn the rich cross-media correlation by a deeper two-level network strategy, and finally get the shared representation by a stacked network style. Experiment results show that CMDN achieves better performance comparing with several state-of-the-art methods on 3 extensively used cross-media datasets.