Weight Features for Predicting Future Model Performance of Deep Neural Networks / 2231
Yasunori Yamada, Tetsuro Morimura
Deep neural networks frequently require the careful tuning of model hyper parameters. Recent research has shown that automated early termination of underperformance runs can speed up hyper parameter searches. However, these studies have used only learning curve for predicting the eventual model performance. In this study, we propose using weight features extracted from network weights at an early stage of the learning process as explanation variables for predicting the eventual model performance. We conduct experiments on hyper parameter searches with various types of convolutional neural network architecture on three image datasets and apply the random forest method for predicting the eventual model performance. The results show that use of the weight features improves the predictive performance compared with use of the learning curve. In all three datasets, the most important feature for the prediction was related to weight changes in the last convolutional layers. Our findings demonstrate that using weight features can help construct prediction models with a smaller number of training samples and terminate underperformance runs at an earlier stage of the learning process of DNNs than the conventional use of learning curve, thus facilitating the speed-up of hyper parameter searches.