Multiple Information Sources Cooperative Learning

Many applications are facing the problem of learning from an objective dataset, whereas information from other auxiliary sources may be beneficial but cannot be integrated into the objective dataset for learning. In this paper, we propose an omni-view learning approach to enable learning from multiple data collections. The theme is to organize heterogeneous data sources into a unified table with global data view. To achieve the omni-view learning goal, we consider that the objective dataset and the auxiliary datasets share some instance-level dependency structures. We then propose a relational k-means to cluster instances in each auxiliary dataset, such that clusters can help build new features to capture correlations between the objective and auxiliary datasets. Experimental results demonstrate that omni-view learning can help build models which outperform the ones learned from the objective dataset only. Comparisons with the co-training algorithm further assert that omni-view learning provides an alternative, yet effective, way for semi-supervised learning.

Xingquan Zhu, Ruoming Jin