Dealing with missing values for correlations calculation

Delphine picture Delphine · Sep 16, 2011 · Viewed 84.5k times · Source

I have huge matrix with a lot of missing values. I want to get the correlation between variables.

1. Is the solution

cor(na.omit(matrix))

better than below?

cor(matrix, use = "pairwise.complete.obs")

I already have selected only variables having more than 20% of missing values.

2. Which is the best method to make sense ?

Answer

IRTFM picture IRTFM · Sep 16, 2011

I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in the spaces. See Harrell's text "Regression Modeling Strategies" for a wealth of guidance on 'how's to do this properly.