Skip to main navigation Skip to search Skip to main content

Stability estimation for unsupervised clustering: A review

  • SUNY Buffalo
  • Roswell Park Cancer Institute

Research output: Contribution to journalReview articlepeer-review

69 Scopus citations

Abstract

Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different objective functions, different parameters, and dissimilarity measures. The purpose of clustering is versatile, often playing critical roles in the early stages of exploratory data analysis and as an endpoint for knowledge and discovery. Thus, understanding the quality of a clustering is of critical importance. The concept of stability has emerged as a strategy for assessing the performance and reproducibility of data clustering. The key idea is to produce perturbed data sets that are very close to the original, and cluster them. If the clustering is stable, then the clusters from the original data will be preserved in the perturbed data clustering. The nature of the perturbation, and the methods for quantifying similarity between clusterings, are nontrivial, and ultimately what distinguishes many of the stability estimation methods apart. In this review, we provide an overview of the very active research area of cluster stability estimation and discuss some of the open questions and challenges that remain in the field. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification.

Original languageEnglish
Article numbere1575
JournalWiley Interdisciplinary Reviews: Computational Statistics
Volume14
Issue number6
DOIs
StatePublished - Nov 1 2022

Keywords

  • clustering
  • model selection
  • resampling
  • stability
  • unsupervised learning
  • validation

Fingerprint

Dive into the research topics of 'Stability estimation for unsupervised clustering: A review'. Together they form a unique fingerprint.

Cite this