Matlab: Kmeans gives different results each time

user1129988 picture user1129988 · Aug 27, 2012 · Viewed 8.6k times · Source

I running kmeans in matlab on a 400x1000 matrix and for some reason whenever I run the algorithm I get different results. Below is a code example:

[idx, ~, ~, ~] = kmeans(factor_matrix, 10, 'dist','sqeuclidean','replicates',20);

For some reason, each time I run this code I get different results? any ideas?

I am using it to identify multicollinearity issues.

Thanks for the help!

Answer

Ansari picture Ansari · Aug 27, 2012

The k-means implementation in MATLAB has a randomized component: the selection of initial centers. This causes different outcomes. Practically however, MATLAB runs k-means a number of times and returns you the clustering with the lowest distortion. If you're seeing wildly different clusterings each time, it may mean that your data is not amenable to the kind of clusters (spherical) that k-means looks for, and is an indication toward trying other clustering algorithms (e.g. spectral ones).

You can get deterministic behavior by passing it an initial set of centers as one of the function arguments (the start parameter). This will give you the same output clustering each time. There are several heuristics to choose the initial set of centers (e.g. K-means++).