Speeding up Very Fast Decision Tree with Low Computational Cost

Speeding up Very Fast Decision Tree with Low Computational Cost

Jian Sun, Hongyu Jia, Bo Hu, Xiao Huang, Hao Zhang, Hai Wan, Xibin Zhao

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 1272-1278. https://doi.org/10.24963/ijcai.2020/177

Very Fast Decision Tree (VFDT) is one of the most widely used online decision tree induction algorithms, and it provides high classification accuracy with theoretical guarantees. In VFDT, the split-attempt operation is essential for leaf-split. It is computation-intensive since it computes the heuristic measure of all attributes of a leaf. To reduce split-attempts, VFDT tries to split at constant intervals (for example, every 200 examples). However, this mechanism introduces split-delay for split can only happen at fixed intervals, which slows down the growth of VFDT and finally lowers accuracy. To address this problem, we first devise an online incremental algorithm that computes the heuristic measure of an attribute with a much lower computational cost. Then a subset of attributes is carefully selected to find a potential split timing using this algorithm. A split-attempt will be carried out once the timing is verified. By the whole process, computational cost and split-delay are lowered significantly. Comprehensive experiments are conducted using multiple synthetic and real datasets. Compared with state-of-the-art algorithms, our method reduces split-attempts by about 5 to 10 times on average with much lower split-delay, which makes our algorithm run faster and more accurate.
Keywords:
Data Mining: Mining Data Streams
Machine Learning: Online Learning
Machine Learning: Classification