Cluster one-dimensional data optimally?

Laciel picture Laciel · Oct 24, 2011 · Viewed 29.2k times · Source

Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works?

Or: what is the most optimal way to do k-means clustering in one-dimension?

Answer

user6417312 picture user6417312 · Jun 3, 2016

Univariate k-means clustering can be solved in O(kn) time (on already sorted input) based on theoretical results on Monge matrices, but the approach was not popular most likely due to numerical instability and also perhaps coding challenges.

A better option is an O(knlgn) method that is now implemented in Ckmeans.1d.dp version 3.4.6. This implementation is as fast as heuristic k-means but offers guaranteed optimality, orders of magnitude better than heuristic k-means especially for large k's.

The generic dynamic programming solution by Richard Bellman (1973) does not touch upon specifics of the k-means problem and the implied runtime is O(kn^3).