Large Scale Sparse Clustering / 2336
Ruqi Zhang, Zhiwu Lu
Large-scale clustering has found wide applications in many fields and received much attention in recent years. However, most existing large-scale clustering methods can only achieve mediocre performance, because they are sensitive to the unavoidable presence of noise in the large-scale data. To address this challenging problem, we thus propose a large-scale sparse clustering (LSSC) algorithm. In this paper, we choose a two-step optimization strategy for large-scale sparse clustering: 1) k-means clustering over the large-scale data to obtain the initial clustering results; 2) clustering refinement over the initial results by developing a spare coding algorithm. To guarantee the scalability of the second step for large-scale data, we also utilize nonlinear approximation and dimension reduction techniques to speed up the sparse coding algorithm. Experimental results on both synthetic and real-world datasets demonstrate the promising performance of our LSSC algorithm.