Dual Active Learning for Both Model and Data Selection

Ying-Peng Tang; Sheng-Jun Huang

doi:10.24963/ijcai.2021/420

Dual Active Learning for Both Model and Data Selection

Ying-Peng Tang, Sheng-Jun Huang

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

Main Track. Pages 3052-3058. https://doi.org/10.24963/ijcai.2021/420

PDF BibTeX

To learn an effective model with less training examples, existing active learning methods typically assume that there is a given target model, and try to fit it by selecting the most informative examples. However, it is less likely to determine the best target model in prior, and thus may get suboptimal performance even if the data is perfectly selected. To tackle with this practical challenge, this paper proposes a novel framework of dual active learning (DUAL) to simultaneously perform model search and data selection. Specifically, an effective method with truncated importance sampling is proposed for Combined Algorithm Selection and Hyperparameter optimization (CASH), which mitigates the model evaluation bias on the labeled data. Further, we propose an active query strategy to label the most valuable examples. The strategy on one hand favors discriminative data to help CASH search the best model, and on the other hand prefers informative examples to accelerate the convergence of winner models. Extensive experiments are conducted on 12 openML datasets. The results demonstrate the proposed method can effectively learn a superior model with less labeled examples.

Keywords:

Machine Learning: Active Learning

Machine Learning: Weakly Supervised Learning