Learn Smart with Less: Building Better Online Decision Trees with Fewer Training Examples

Learn Smart with Less: Building Better Online Decision Trees with Fewer Training Examples

Ariyam Das, Jin Wang, Sahil M. Gandhi, Jae Lee, Wei Wang, Carlo Zaniolo

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 2209-2215. https://doi.org/10.24963/ijcai.2019/306

Online  decision  tree  models  are  extensively  used in  many  industrial  machine  learning  applications for real-time classification tasks. These models are highly accurate, scalable and easy to use in practice. The Very Fast Decision Tree (VFDT) is the classic  online  decision  tree  induction  model  that has been widely adopted due to its theoretical guarantees  as  well  as  competitive  performance.  However, VFDT and its variants solely rely on conservative statistical measures like Hoeffding bound to incrementally grow the tree. This makes these models  extremely  circumspect  and  limits  their  ability to  learn  fast.  In  this  paper,  we  efficiently  employ statistical  resampling  techniques  to  build  an online tree faster using fewer examples. We first theoretically show that a naive implementation of resampling techniques like non-parametric bootstrap does not scale due to large memory and computational overheads. We  mitigate  this  by  proposing  a  robust  memory-efficient bootstrap simulation heuristic (Mem-ES) that  successfully  expedites  the  learning  process. Experimental  results  on  both  synthetic  data  and large-scale real world datasets demonstrate the efficiency  and  effectiveness  of  our  proposed  technique.
Keywords:
Machine Learning: Classification
Machine Learning: Data Mining
Machine Learning: Online Learning
Machine Learning: Time-series;Data Streams