A Clustering-based framework for Classifying Data Streams

A Clustering-based framework for Classifying Data Streams

Xuyang Yan, Abdollah Homaifar, Mrinmoy Sarkar, Abenezer Girma, Edward Tunstel

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 3257-3263. https://doi.org/10.24963/ijcai.2021/448

The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches either require an initial label set or rely on specialized design parameters. The overlap among classes and the labeling of data streams constitute other major challenges for classifying data streams. In this paper, we proposed a clustering-based data stream classification framework to handle non-stationary data streams without utilizing an initial label set. A density-based stream clustering procedure is used to capture novel concepts with a dynamic threshold and an effective active label querying strategy is introduced to continuously learn the new concepts from the data streams. The sub-cluster structure of each cluster is explored to handle the overlap among classes. Experimental results and quantitative comparison studies reveal that the proposed method provides statistically better or comparable performance than the existing methods.
Keywords:
Machine Learning: Online Learning
Data Mining: Classification
Data Mining: Mining Data Streams