Cluster Analysis in R with missing data

Question 1

Cluster Analysis in R with missing data

r cluster-analysis r-daisy

akvallejos · Nov 12, 2014 · Viewed 7.3k times · Source

Answer

Answer

Mixture models permit clustering of data set with missing values, by assuming that values are missing completely at random (MCAR). Moreover, information criteria (like BIC or ICL) permit to select the number of clusters. You can use the R package VarSelLCM to cluster these data (there is a Shiny application to interpret the results). A tutorial of this package is available here

Question 2

So I spent a good amount of time trying to find the answer on how to do this. The only answer I have found so far is here: How to perform clustering without removing rows where NA is present in R

Unfortunately, this is not working for me.

So here is an example of my data (d in this example):

Q9Y6X2           NA -6.350055943 -5.78314068
Q9Y6X3           NA           NA -5.78314068
Q9Y6X6  0.831273549  4.875151493  0.78671493
Q9Y6Y8  4.831273549  0.457298979  5.59406985
Q9Y6Z4  4.831273549  4.875151493          NA

Here is what I tried:

> dist <- daisy(d,metric = "gower")
> hc <- hclust(dist)
Error in hclust(dist) : NA/NaN/Inf in foreign function call (arg 11)

From my understanding daisy should be able to handle NA values, but I am still receiving an error when trying to cluster my results.

Thanks.

Cluster Analysis in R with missing data

Answer

Related questions