On Metric DBSCAN with Low Doubling Dimension

On Metric DBSCAN with Low Doubling Dimension

Hu Ding, Fan Yang, Mingyue Wang

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 3080-3086. https://doi.org/10.24963/ijcai.2020/426

The density based clustering method Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a popular method for outlier recognition and has received tremendous attention from many different areas. A major issue of the original DBSCAN is that the time complexity could be as large as quadratic. Most of existing DBSCAN algorithms focus on developing efficient index structures to speed up the procedure in low-dimensional Euclidean space. However, the research of DBSCAN in high-dimensional Euclidean space or general metric spaces is still quite limited, to the best of our knowledge. In this paper, we consider the metric DBSCAN problem under the assumption that the inliers (excluding the outliers) have a low doubling dimension. We apply a novel randomized k-center clustering idea to reduce the complexity of range query, which is the most time consuming step in the whole DBSCAN procedure. Our proposed algorithms do not need to build any complicated data structures and are easy to implement in practice. The experimental results show that our algorithms can significantly outperform the existing DBSCAN algorithms in terms of running time.
Keywords:
Machine Learning: Clustering
Machine Learning: Unsupervised Learning
Data Mining: Clustering, Unsupervised Learning