Extracting PCA axes for further analysis

r pca lm
Koentjes picture Koentjes · Mar 28, 2013 · Viewed 16k times · Source

I am analysing data regarding reed fields. Variables I have measured are water depth, reed height, reed density, etc. As some of the variables are dependent, I performed a PCA in order to reduce these variables to 2 PCA-axes (N=104).

For executing the PCA I used the vegan package in R. My data looks like this:

row.names   Waterpeil   hoogte_max  Som Leeftijd_riet   PFD oppervlakte onderlaag_num   afst_rand
1   1   5   2.5 51  0.15686274  1.616921    8.127192    2   24.154590
2   3   9   2.5 44  0.13636364  1.564643    9.023642    2   8.349288
3   4   0   2.5 84  0.30952381  1.352548    8.498775    2   26.226896
4   5   0   3.5 58  0.43103448  1.384183    9.301617    1   57.320000
5   6   40  2.5 52  0.42307692  1.361262    10.316058   1   45.470000
6   7   5   3.0 19  0.00000000  1.429287    9.927788    1   36.720000
7   9   0   2.5 64  0.28125000  1.355100    8.029911    2   19.560000
8   11  120 3.5 29  0.03448276  1.336117    11.147484   1   252.630000
9   14  0   2.0 27  0.07407407  1.847756    7.445060    2   1.864342
10  16  20  2.5 57  0.24561404  1.582308    8.425177    2   9.490196
11  17  5   3.0 54  0.01851852  1.348305    9.315008    2   15.960000
12  18  0   1.5 5   1.00000000  1.643657    8.063648    2   6.526300
13  21  0   2.0 18  0.05555556  1.394964    8.752185    2   37.576955
14  22  20  2.0 48  0.16666667  1.617045    8.911028    1   11.592383
15  25  0   2.5 71  0.42253521  1.749114    7.271499    2   6.572772
16  26  0   2.0 50  0.30000000  1.464582    7.349908    2   9.849276
17  27  5   2.5 61  0.34426229  1.511217    8.379012    2   14.082827
18  28  5   2.0 123 0.06504065  1.538188    8.271017    2   11.658142
19  29  100 3.0 75  0.44000000  1.896483    7.968603    1   9.071897
20  30  100 3.0 95  0.55789474  1.768147    8.367626    1   2.300783
21  32  0   3.0 74  0.45945946  1.458793    9.453464    2   57.210000
22  33  15  3.0 66  0.24242424  1.572704    7.620507    1   8.700000
23  34  5   3.0 83  0.38554217  1.436063    11.636262   1   50.613265
24  35  5   2.5 58  0.31034483  1.313440    9.370347    2   52.605041
25  36  20  2.5 91  0.28571429  1.544032    8.451961    1   9.713351
26  37  10  2.5 34  0.23529412  1.524725    9.348687    2   6.920026
27  38  20  2.5 48  0.41666667  1.584892    7.780915    1   11.302639
28  39  40  2.5 51  0.15686274  1.535552    6.994035    1   18.999423
29  40  35  2.5 48  0.45833333  1.460579    9.073331    1   12.869075
30  41  5   3.0 58  0.43103448  1.747669    7.628542    2   3.860225
31  42  25  2.5 36  0.52777778

I have done this, this is the output for the first two axes:

y<-rda(nestendca2) 
summary(y)
               PC1       PC2    
Waterpeil     13.816422 -2.312641
hoogte_max     0.094747 -0.014497 
Som            2.955029 10.812549  
Leeftijd_riet  0.016476  0.019629  
PFD            0.007361 -0.003386  
oppervlakte    0.052943  0.039657 

Now I want to implement these two axes in a logistic regression, relating it to breeding success of a bird of prey that breeds in these fields.

How can I do this?

Answer

dickoa picture dickoa · Mar 28, 2013

Assuming that you use prcomp in R. Here is one way to do that

pca <- prcomp(~ Murder + Rape + Assault, data = USArrests, scale = TRUE)

(loadings <- pca$rotation)

##              PC1      PC2      PC3
## Murder  -0.58260  0.53395 -0.61276
## Rape    -0.53938 -0.81798 -0.19994
## Assault -0.60798  0.21402  0.76456

axes <- predict(pca, newdata = USArrests)
head(axes, 4)

##               PC1      PC2       PC3
## Alabama  -1.19803  0.83381 -0.162178
## Alaska   -2.30875 -1.52396  0.038336
## Arizona  -1.50333 -0.49830  0.878223
## Arkansas -0.17599  0.32473  0.071112

You can now use these new columns (axes) in your logistic regression if you wish. I'll show you just an example using a simple linear model.

dat <- cbind(USArrests, axes)
lm(UrbanPop ~ PC1 + PC2, data = dat)

## Call:
## lm(formula = UrbanPop ~ PC1 + PC2, data = dat)

## Coefficients:
## (Intercept)          PC1          PC2  
##       65.54        -2.58        -7.71