I've been using k-means to cluster my data in R but I'd like to be able to assess the fit vs. model complexity of my clustering using Baysiean Information Criterion (BIC) and AIC. Currently the code I've been using in R is:
KClData <- kmeans(Data, centers=2, nstart= 100)
But I'd like to be able to extract the BIC and Log Likelihood. Any help would be greatly appreciated!
For anyone else landing here, there's a method proposed by Sherry Towers at http://sherrytowers.com/2013/10/24/k-means-clustering/, which uses output from stats::kmeans
. I quote:
The AIC can be calculated with the following function:
kmeansAIC = function(fit){ m = ncol(fit$centers) n = length(fit$cluster) k = nrow(fit$centers) D = fit$tot.withinss return(D + 2*m*k) }
From the help for stats::AIC
, you can also see that the BIC can be calculated in a similar way to the AIC. An easy way to get the BIC is to replace the return()
in the above function, with this:
return(data.frame(AIC = D + 2*m*k,
BIC = D + log(n)*m*k))
So you would use this as follows:
fit <- kmeans(x = data,centers = 6)
kmeansAIC(fit)