Explanation-Based Feature Construction

Shiau Hong Lim, Li-Lun Wang, Gerald DeJong

Choosing good features to represent objects can be crucial to the success of supervised machine learning algorithms. Good high-level features are those that concentrate information about the classification task. Such features can often be constructed as non-linear combinations of raw or native input features such as the pixels of an image. Using many nonlinear combinations, as do SVMs, can dilute the classification information necessitating many training examples. On the other hand, searching even a modestly-expressive space of nonlinear functions for high-information ones can be intractable. We describe an approach to feature construction where task-relevant discriminative features are automatically constructed, guided by an explanation-based interaction of training examples and prior domain knowledge. We show that in the challenging task of distinguishing handwritten Chinese characters, our automatic feature-construction approach performs particularly well on the most difficult and complex character pairs.