Using Domain Knowledge to Systematically Guide Feature Selection / 3215
The effectiveness of machine learning models can often be improved by feature selection as a pre-processing step. Often this is a data driven process only and can result in models that may not correspond to true relationships present in the data set due to overfitting. In this work, we propose leveraging known relationships between variables to constrain and guide feature selection. Using commonalities across domains, we provide a framework for the user to express model constraints while still making the feature selection process data driven and sensitive to actual relationships in the data.