On Combining Side Information and Unlabeled Data for Heterogeneous Multi-Task Metric Learning / 1809
Yong Luo, Yonggang Wen, Dacheng Tao
Distance metric learning (DML) is critical for a wide variety of machine learning algorithms and pattern recognition applications. Transfer metric learning (TML) leverages the side information (e.g., similar/dissimilar constraints over pairs of samples) from related domains to help the target metric learning (with limited information). Current TML tools usually assume that different domains exploit the same feature representation, and thus are not applicable to tasks where data are drawn from heterogeneous domains. Heterogeneous transfer learning approaches handle heterogeneous domains by usually learning feature transformations across different domains. The learned transformation can be used to derive a metric, but these approaches are mostly limited by their capability of only handling two domains. This motivates the proposed heterogeneous multi-task metric learning (HMTML) framework for handling multiple domains by combining side information and unlabeled data. Specifically, HMTML learns the metrics for all different domains simultaneously by maximizing their high-order correlation (parameterized by feature covariance of unlabeled data) in a common subspace, which is induced by the transformations derived from the metrics. Extensive experiments on both multi-language text categorization and multi-view social image annotation demonstrate the effectiveness of the proposed method.