How to calculate BIC for k-means clustering in R

UnivStudent picture UnivStudent · Apr 5, 2013 · Viewed 19.6k times · Source

I've been using k-means to cluster my data in R but I'd like to be able to assess the fit vs. model complexity of my clustering using Baysiean Information Criterion (BIC) and AIC. Currently the code I've been using in R is:

KClData <- kmeans(Data, centers=2, nstart= 100)

But I'd like to be able to extract the BIC and Log Likelihood. Any help would be greatly appreciated!

Answer

Andy Clifton picture Andy Clifton · Aug 28, 2014

For anyone else landing here, there's a method proposed by Sherry Towers at http://sherrytowers.com/2013/10/24/k-means-clustering/, which uses output from stats::kmeans. I quote:

The AIC can be calculated with the following function:

kmeansAIC = function(fit){

m = ncol(fit$centers)
n = length(fit$cluster)
k = nrow(fit$centers)
D = fit$tot.withinss
return(D + 2*m*k)
}

From the help for stats::AIC, you can also see that the BIC can be calculated in a similar way to the AIC. An easy way to get the BIC is to replace the return() in the above function, with this:

return(data.frame(AIC = D + 2*m*k,
                  BIC = D + log(n)*m*k))

So you would use this as follows:

fit <- kmeans(x = data,centers = 6)
kmeansAIC(fit)