Inter-node Hellinger Distance based Decision Tree

Inter-node Hellinger Distance based Decision Tree

Pritom Saha Akash, Md. Eusha Kadir, Amin Ahsan Ali, Mohammad Shoyaib

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 1967-1973. https://doi.org/10.24963/ijcai.2019/272

This paper introduces a new splitting criterion called Inter-node Hellinger Distance (iHD) and a weighted version of it (iHDw) for constructing decision trees. iHD measures the distance between the parent and each of the child nodes in a split using Hellinger distance. We prove that this ensures the mutual exclusiveness between the child nodes. The weight term in iHDw is concerned with the purity of individual child node considering the class imbalance problem. The combination of the distance and weight term in iHDw thus favors a partition where child nodes are purer and mutually exclusive, and skew insensitive. We perform an experiment over twenty balanced and twenty imbalanced datasets. The results show that decision trees based on iHD win against six other state-of-the-art methods on at least 14 balanced and 10 imbalanced datasets. We also observe that adding the weight to iHD improves the performance of decision trees on imbalanced datasets. Moreover, according to the result of the Friedman test, this improvement is statistically significant compared to other methods.
Keywords:
Machine Learning: Classification
Machine Learning: Data Mining