Calculate correlation with cor(), only for numerical columns

wishihadabettername picture wishihadabettername · Aug 26, 2010 · Viewed 134.5k times · Source

I have a dataframe and would like to calculate the correlation (with Spearman, data is categorical and ranked) but only for a subset of columns. I tried with all, but R's cor() function only accepts numerical data (x must be numeric, says the error message), even if Spearman is used.

One brute approach is to delete the non-numerical columns from the dataframe. This is not as elegant, for speed I still don't want to calculate correlations between all columns.

I hope there is a way to simply say "calculate correlations for columns x, y, z". Column references could by number or by name. I suppose the flexible way to provide them would be through a vector.

Any suggestions are appreciated.

Answer

Greg picture Greg · Aug 26, 2010

if you have a dataframe where some columns are numeric and some are other (character or factor) and you only want to do the correlations for the numeric columns, you could do the following:

set.seed(10)

x = as.data.frame(matrix(rnorm(100), ncol = 10))
x$L1 = letters[1:10]
x$L2 = letters[11:20]

cor(x)

Error in cor(x) : 'x' must be numeric

but

cor(x[sapply(x, is.numeric)])

             V1         V2          V3          V4          V5          V6          V7
V1   1.00000000  0.3025766 -0.22473884 -0.72468776  0.18890578  0.14466161  0.05325308
V2   0.30257657  1.0000000 -0.27871430 -0.29075170  0.16095258  0.10538468 -0.15008158
V3  -0.22473884 -0.2787143  1.00000000 -0.22644156  0.07276013 -0.35725182 -0.05859479
V4  -0.72468776 -0.2907517 -0.22644156  1.00000000 -0.19305921  0.16948333 -0.01025698
V5   0.18890578  0.1609526  0.07276013 -0.19305921  1.00000000  0.07339531 -0.31837954
V6   0.14466161  0.1053847 -0.35725182  0.16948333  0.07339531  1.00000000  0.02514081
V7   0.05325308 -0.1500816 -0.05859479 -0.01025698 -0.31837954  0.02514081  1.00000000
V8   0.44705527  0.1698571  0.39970105 -0.42461411  0.63951574  0.23065830 -0.28967977
V9   0.21006372 -0.4418132 -0.18623823 -0.25272860  0.15921890  0.36182579 -0.18437981
V10  0.02326108  0.4618036 -0.25205899 -0.05117037  0.02408278  0.47630138 -0.38592733
              V8           V9         V10
V1   0.447055266  0.210063724  0.02326108
V2   0.169857120 -0.441813231  0.46180357
V3   0.399701054 -0.186238233 -0.25205899
V4  -0.424614107 -0.252728595 -0.05117037
V5   0.639515737  0.159218895  0.02408278
V6   0.230658298  0.361825786  0.47630138
V7  -0.289679766 -0.184379813 -0.38592733
V8   1.000000000  0.001023392  0.11436143
V9   0.001023392  1.000000000  0.15301699
V10  0.114361431  0.153016985  1.00000000