Partial Correlations in R

user1897691 picture user1897691 · Jan 10, 2013 · Viewed 15.7k times · Source

I am trying to compute a partial correlation in R. I have the two data sets that I want to compare and currently only one controlled variable. (This will change in the future)

I have looked online to try to work this out myself but it is difficult to understand the terminology used on the websites I have looked at. Can someone please explain how I would go about doing this and perhaps provide a simple example?

Data is in the following form:

                Project.Name Bugs.Project Changes.Project Orgs.Project
1     platform_external_svox            4             161            2
3 platform_packages_apps_Nfc           13             223            2
5      platform_system_media           36             307            2
7     platform_external_mtpd            2              30            2
9            platform_bionic           42            1061            4

I want the correlation between Bugs.Project and Orgs.Project with Changes.Project as a controlled variable. I have downloaded the ppcor library since it looks like it has the functionality that I need. I am unsure how to use it, however. How do I add my data to a matrix and use the pcor function?

This is what I've been trying:

y.data <- data.frame(
bpp=c(projRelateBugsOrgs[2]),
opp=c(projRelateBugsOrgs[4]),
cpp=c(projRelateBugsOrgs[3])
)

test <- pcor(y.data)

I just used an example I found and tried to use my data in place of theirs. I don't understand my output.

It looks like this:

$estimate
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project       1.0000000    0.3935535       0.9749296
Orgs.Project       0.3935535    1.0000000      -0.1800788
Changes.Project    0.9749296   -0.1800788       1.0000000

$p.value
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project     0.00000e+00  2.09795e-07       0.0000000
Orgs.Project     2.09795e-07  0.00000e+00       0.0264442
Changes.Project  0.00000e+00  2.64442e-02       0.0000000

$statistic
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project        0.000000     5.190442       53.122165
Orgs.Project        5.190442     0.000000       -2.219625
Changes.Project    53.122165    -2.219625        0.000000

$n
[1] 150

$gp
[1] 1

$method
[1] "pearson"

I think I want something from the $estimate table but I'm not exactly sure what it's giving me,

Answer

mnel picture mnel · Jan 10, 2013

Reading from help('pcor') in the value section

Value

estimate a matrix of the partial correlation coefficient between two variables

p.value a matrix of the p value of the test

statistic a matrix of the value of the test statistic

n the number of samples

gn the number of given variables

method the correlation method used

The details section gives

Details

Partial correlation is the correlation of two variables while controlling for a third or more other variables.

For your result

$estimate
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project       1.0000000    0.3935535       0.9749296
Orgs.Project       0.3935535    1.0000000      -0.1800788
Changes.Project    0.9749296   -0.1800788       1.0000000

The partial correlation of Changes.Project and Orgs.Project is -0.1800788. This is the correlation of Changes.Project and Orgs.Project controlling for Bugs.Project

The partial correlation of Changes.Project and Bugs.Project is 0.9747296. This is the correlation of Changes.Project and Bugs.Project controlling for Orgs.Project

The partial correlation of Orgs.Project and Bugs.Project is 0.3935535. This is the correlation of Orgs.Project and Bugs.Project controlling for Changes.Project

You could get same information (if you are only interested in this third case) from

pcor.test(y.data$Orgs.Project, y.data$Bugs.Project, y.data$Changes.Project)