Learning Latest Classifiers without Additional Labeled Data

Learning Latest Classifiers without Additional Labeled Data

Atsutoshi Kumagai, Tomoharu Iwata

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 2039-2045. https://doi.org/10.24963/ijcai.2017/283

In various applications such as spam mail classification, the performance of classifiers deteriorates over time. Although retraining classifiers using labeled data helps to maintain the performance, continuously preparing labeled data is quite expensive. In this paper, we propose a method to learn classifiers by using newly obtained unlabeled data, which are easy to prepare, as well as labeled data collected beforehand. A major reason for the performance deterioration is the emergence of new features that do not appear in the training phase. Another major reason is the change of the distribution between the training and test phases. The proposed method learns the latest classifiers that overcome both problems. With the proposed method, the conditional distribution of new features given existing features is learned using the unlabeled data. In addition, the proposed method estimates the density ratio between training and test distributions by using the labeled and unlabeled data. We approximate the classification error of a classifier, which exploits new features as well as existing features, at the test phase by incorporating both the conditional distribution of new features and the densityratio, simultaneously. By minimizing the approximated error while integrating out new feature values, we obtain a classifier that exploits new features and fits on the test phase. The effectiveness of the proposed method is demonstrated with experiments using synthetic and real-world data sets.
Keywords:
Machine Learning: Classification
Machine Learning: Machine Learning
Machine Learning: Transfer, Adaptation, Multi-task Learning