Domain Adaptation for Learning from Label Proportions Using Self-Training / 3670
Ehsan Mohammady Ardehaly, Aron Culotta
Learning from Label Proportions (LLP) is a machine learning problem in which the training data consist of bags of instances, and only the class label distribution for each bag is known. In some domains label proportions are readily available; for example, by grouping social media users by location, one can use census statistics to build a classifier for user demographics. However, label proportions are unavailable in many domains, such as product review sites. The goal of this paper is to determine whether an LLP classifier fit in one domain can be modified to classify instances from another domain. To do so, we propose a domain adaptation algorithm that uses an LLP model fit on the source domain to generate label proportions for the target domain. A new LLP model is then fit on the target domain, and this self-training process is repeated to adapt the model from source to target. Our experiments on five diverse tasks indicate an 11% average absolute improvement in accuracy as compared to using LLP without domain adaptation. In contrast to existing domain adaptation algorithms, our approach requires only label proportions in the source domain, and the results suggest that the approach is effective even when the target domain is substantially different from the source domain.