COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

Toon Van Craenendonck, Sebastijan Dumancic, Hendrik Blockeel

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 2871-2877. https://doi.org/10.24963/ijcai.2017/400

Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance.
Keywords:
Machine Learning: Active Learning
Machine Learning: Data Mining
Machine Learning: Semi-Supervised Learning
Machine Learning: Unsupervised Learning