I have performed a canonical correspondece analysis in R using the vegan package but i find the output very difficult to understand. The triplot is understandable, but all the numbers I get from the summary(cca) are confusing to me (as i've just started to learn about ordination techniques) I would like to know how much of the variance in Y that is explained by X (in this case, the environmental variables) and which of the independent variables that are important in this model?
my output looks like this:
Partitioning of mean squared contingency coefficient:
Inertia Proportion
Total 4.151 1.0000
Constrained 1.705 0.4109
Unconstrained 2.445 0.5891
Eigenvalues, and their contribution to the mean squared contingency coefficient
Importance of components:
CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7
Eigenvalue 0.6587 0.4680 0.34881 0.17690 0.03021 0.02257 0.0002014
Proportion Explained 0.1587 0.1127 0.08404 0.04262 0.00728 0.00544 0.0000500
Cumulative Proportion 0.1587 0.2714 0.35548 0.39810 0.40538 0.41081 0.4108600
CA1 CA2 CA3 CA4 CA5 CA6 CA7
Eigenvalue 0.7434 0.6008 0.36668 0.33403 0.28447 0.09554 0.02041
Proportion Explained 0.1791 0.1447 0.08834 0.08047 0.06853 0.02302 0.00492
Cumulative Proportion 0.5900 0.7347 0.82306 0.90353 0.97206 0.99508 1.00000
Accumulated constrained eigenvalues
Importance of components:
CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7
Eigenvalue 0.6587 0.4680 0.3488 0.1769 0.03021 0.02257 0.0002014
Proportion Explained 0.3863 0.2744 0.2045 0.1037 0.01772 0.01323 0.0001200
Cumulative Proportion 0.3863 0.6607 0.8652 0.9689 0.98665 0.99988 1.0000000
Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions
Species scores
CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
S.marinoi -0.3890 0.39759 0.1080 -0.005704 -0.005372 -0.0002441
C.tripos 1.8428 0.23999 -0.1661 -1.337082 0.636225 -0.5204045
P.alata 1.6892 0.17910 -0.3119 0.997590 0.142028 0.0601177
P.seriata 1.4365 -0.15112 -0.8646 0.915351 -1.455675 -1.4054078
D.confervacea 0.2098 -1.23522 0.5317 -0.089496 -0.034250 0.0278820
C.decipiens 2.2896 0.65801 -1.0315 -1.246933 -0.428691 0.3649382
P.farcimen -1.2897 -1.19148 -2.3562 0.032558 0.104148 -0.0068910
C.furca 1.4439 -0.02836 -0.9459 0.301348 -0.975261 0.4861669
Biplot scores for constraining variables
CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
Temperature 0.88651 0.1043 -0.07283 -0.30912 -0.22541 0.24771
Salinity 0.32228 -0.3490 0.30471 0.05140 -0.32600 0.44408
O2 -0.81650 0.4665 -0.07151 0.03457 0.20399 -0.20298
Phosphate 0.22667 -0.8415 0.41741 -0.17725 -0.06941 -0.06605
TotP -0.33506 -0.6371 0.38858 -0.05094 -0.24700 -0.25107
Nitrate 0.15520 -0.3674 0.38238 -0.07154 -0.41349 -0.56582
TotN -0.23253 -0.3958 0.16550 -0.25979 -0.39029 -0.68259
Silica 0.04449 -0.8382 0.15934 -0.22951 -0.35540 -0.25650
Which of all these numbers are important to my analysis? /anna
X
?In a CCA, variance isn't variance in the normal sense. We express it as the "mean squared contingency coefficient", or "inertia". All the info you need to ascertain how much "variation" in Y is explained by X is contained in the section of the output that I reproduce below:
Partitioning of mean squared contingency coefficient:
Inertia Proportion
Total 4.151 1.0000
Constrained 1.705 0.4109
Unconstrained 2.445 0.5891
In this example there is total inertia 4.151 and your X variables (these are "Constraints") explain a total of 1.705 bits of inertia, which is about 41%, leaving about 59% unexplained.
The next section referring to eigenvalues allows you to see both in terms of inertia explained and proportion explained which axes contribute significantly to the explanatory "power" of the CCA (the Constrained
part of the table above) and the unexplained "variance" (the Unconstrained
part of the table above.
The next section contains the ordination scores. Think of these as the coordinates of the points in the triplot. For some reason you show the site scores in the output above, but they would normally be there. Note that these have been scaled - by default this is using scaling = 2
- so site points are at their weighted average of the species scores IIRC etc.
The "Biplot" scores are the locations of the arrow heads or the labels on the arrows - I forget exactly how the plot is drawn now.
All of them are important - if you think the triplot is important an interpretable, it is based entirely on the information reported by summary()
. If you have specific questions to ask of the data, then perhaps only certain sections will be of paramount importance to you.
However, StackOverflow is not the place to ask such questions of a statistical nature.