How do I interpret the output of corrplot?

r plot statistics data-visualization r-corrplot

Superbest · Jun 19, 2014 · Viewed 7.1k times · Source

The corrplot packages provides some neat plots and documents with examples.

But I don't understand the output. I can see that if you have a matrix A_ij, you can plot it as an arrangement of n by n square tiles, where the color of tile ij corresponds to the value of A_ij. But some examples appear to have more dimensions:

enter image description here

Here we can guess that color shows the correlation coefficient, and orientation of the ellipse is negative/positive correlation. What is the eccentricity?

The documentation for method says:

the visualization method of correlation matrix to be used. Currently, it supports seven methods, named "circle" (default), "square", "ellipse", "number", "pie", "shade" and "color". See examples for details.

The areas of circles or squares show the absolute value of corresponding correlation coefficients. Method "pie" and "shade" came from Michael Friendly’s job (with some adjustment about the shade added on), and "ellipse" came from D.J. Murdoch and E.D. Chow’s job, see in section References.

So we know that the area, for circles and squares, should show the coefficient. What about the other dimensions, and other methods?

Answer

There is only one dimension shown by the plot.

Michael Friendly, in Corrgrams: Exploratory displays for correlation matrices (the corrplot documentation confusingly refers to this as his "job"), says:

In the shaded row, each cell is shaded blue or red depending on the sign of the correlation, and with the intensity of color scaled 0–100% in proportion to the magnitude of the correlation. (Such scaled colors are easily computed using RGB coding from red, (1, 0, 0), through white (1, 1, 1), to blue (0, 0, 1). For simplicity, we ignore the non-linearities of color reproduction and perception, but note that these are easily accommodated in the color mapping function.) White diagonal lines are added so that the direction of the correlation may still be discerned in black and white. This bipolar scale of color was chosen to leave correlations near 0 empty (white), and to make positive and negative values of equal magnitude approximately equally intensely shaded. Gray scale and other color schemes are implemented in our software (Section 6), but not illustrated here.

The bar and circular symbols also use the same scaled colors, but fill an area proportional to the absolute value of the correlation. For the bars, negative values are filled from the bottom, positive values from the top. The circles are filled clockwise for positive values, anti-clockwise for negative values. The ellipses have their eccentricity parametrically scaled to the correlation value (Murdoch and Chow, 1996). Perceptually, they have the property of becoming visually less prominent as the magnitude of the correlation increases, in contrast to the other glyphs.

(emphasis mine)

enter image description here

"Murdoch and Chow, 1996" is a publication describing the equation for drawing the ellipses (A Graphical Display of Large Correlation Matrices). The ellipses are apparently meant to be caricatures of bivariate normal distributions:

enter image description here

So in conclusion, the only dimension shown is always the correlation coefficient (or the value of A_ij, to use the question's terminology) itself. The multiple apparent dimensions are redundant.

How do I interpret the output of corrplot?

Answer

Related questions