how to interpret cca vegan output

user3420443 picture user3420443 · Mar 20, 2014 · Viewed 9.3k times · Source

I have performed a canonical correspondece analysis in R using the vegan package but i find the output very difficult to understand. The triplot is understandable, but all the numbers I get from the summary(cca) are confusing to me (as i've just started to learn about ordination techniques) I would like to know how much of the variance in Y that is explained by X (in this case, the environmental variables) and which of the independent variables that are important in this model?

my output looks like this:

Partitioning of mean squared contingency coefficient:
              Inertia Proportion
Total           4.151     1.0000
Constrained     1.705     0.4109
Unconstrained   2.445     0.5891

Eigenvalues, and their contribution to the mean squared contingency coefficient 

Importance of components:
                        CCA1   CCA2    CCA3    CCA4    CCA5    CCA6      CCA7
Eigenvalue            0.6587 0.4680 0.34881 0.17690 0.03021 0.02257 0.0002014
Proportion Explained  0.1587 0.1127 0.08404 0.04262 0.00728 0.00544 0.0000500
Cumulative Proportion 0.1587 0.2714 0.35548 0.39810 0.40538 0.41081 0.4108600

                         CA1    CA2     CA3     CA4     CA5     CA6     CA7
Eigenvalue            0.7434 0.6008 0.36668 0.33403 0.28447 0.09554 0.02041
Proportion Explained  0.1791 0.1447 0.08834 0.08047 0.06853 0.02302 0.00492
Cumulative Proportion 0.5900 0.7347 0.82306 0.90353 0.97206 0.99508 1.00000

Accumulated constrained eigenvalues

Importance of components:
                        CCA1   CCA2   CCA3   CCA4    CCA5    CCA6      CCA7
Eigenvalue            0.6587 0.4680 0.3488 0.1769 0.03021 0.02257 0.0002014
Proportion Explained  0.3863 0.2744 0.2045 0.1037 0.01772 0.01323 0.0001200
Cumulative Proportion 0.3863 0.6607 0.8652 0.9689 0.98665 0.99988 1.0000000

Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions

Species scores

                 CCA1     CCA2    CCA3      CCA4      CCA5       CCA6
S.marinoi     -0.3890  0.39759  0.1080 -0.005704 -0.005372 -0.0002441
C.tripos       1.8428  0.23999 -0.1661 -1.337082  0.636225 -0.5204045
P.alata        1.6892  0.17910 -0.3119  0.997590  0.142028  0.0601177
P.seriata      1.4365 -0.15112 -0.8646  0.915351 -1.455675 -1.4054078
D.confervacea  0.2098 -1.23522  0.5317 -0.089496 -0.034250  0.0278820
C.decipiens    2.2896  0.65801 -1.0315 -1.246933 -0.428691  0.3649382
P.farcimen    -1.2897 -1.19148 -2.3562  0.032558  0.104148 -0.0068910
C.furca        1.4439 -0.02836 -0.9459  0.301348 -0.975261  0.4861669

Biplot scores for constraining variables

                CCA1    CCA2     CCA3     CCA4     CCA5     CCA6
Temperature  0.88651  0.1043 -0.07283 -0.30912 -0.22541  0.24771
Salinity     0.32228 -0.3490  0.30471  0.05140 -0.32600  0.44408
O2          -0.81650  0.4665 -0.07151  0.03457  0.20399 -0.20298
Phosphate    0.22667 -0.8415  0.41741 -0.17725 -0.06941 -0.06605
TotP        -0.33506 -0.6371  0.38858 -0.05094 -0.24700 -0.25107
Nitrate      0.15520 -0.3674  0.38238 -0.07154 -0.41349 -0.56582
TotN        -0.23253 -0.3958  0.16550 -0.25979 -0.39029 -0.68259
Silica       0.04449 -0.8382  0.15934 -0.22951 -0.35540 -0.25650

Which of all these numbers are important to my analysis? /anna

Answer

Gavin Simpson picture Gavin Simpson · Mar 20, 2014

How much variation is explained by X?

In a CCA, variance isn't variance in the normal sense. We express it as the "mean squared contingency coefficient", or "inertia". All the info you need to ascertain how much "variation" in Y is explained by X is contained in the section of the output that I reproduce below:

Partitioning of mean squared contingency coefficient:
              Inertia Proportion
Total           4.151     1.0000
Constrained     1.705     0.4109
Unconstrained   2.445     0.5891

In this example there is total inertia 4.151 and your X variables (these are "Constraints") explain a total of 1.705 bits of inertia, which is about 41%, leaving about 59% unexplained.

The next section referring to eigenvalues allows you to see both in terms of inertia explained and proportion explained which axes contribute significantly to the explanatory "power" of the CCA (the Constrained part of the table above) and the unexplained "variance" (the Unconstrained part of the table above.

The next section contains the ordination scores. Think of these as the coordinates of the points in the triplot. For some reason you show the site scores in the output above, but they would normally be there. Note that these have been scaled - by default this is using scaling = 2 - so site points are at their weighted average of the species scores IIRC etc.

The "Biplot" scores are the locations of the arrow heads or the labels on the arrows - I forget exactly how the plot is drawn now.

Which of all these numbers are important to my analysis?

All of them are important - if you think the triplot is important an interpretable, it is based entirely on the information reported by summary(). If you have specific questions to ask of the data, then perhaps only certain sections will be of paramount importance to you.

However, StackOverflow is not the place to ask such questions of a statistical nature.