I running kmeans in matlab on a 400x1000 matrix and for some reason whenever I run the algorithm I get different results. Below is a code example:
[idx, ~, ~, ~] = kmeans(factor_matrix, 10, 'dist','sqeuclidean','replicates',20);
For some reason, each time I run this code I get different results? any ideas?
I am using it to identify multicollinearity issues.
Thanks for the help!
The k-means implementation in MATLAB has a randomized component: the selection of initial centers. This causes different outcomes. Practically however, MATLAB runs k-means a number of times and returns you the clustering with the lowest distortion. If you're seeing wildly different clusterings each time, it may mean that your data is not amenable to the kind of clusters (spherical) that k-means looks for, and is an indication toward trying other clustering algorithms (e.g. spectral ones).
You can get deterministic behavior by passing it an initial set of centers as one of the function arguments (the start
parameter). This will give you the same output clustering each time. There are several heuristics to choose the initial set of centers (e.g. K-means++).