I am trying to compute a partial correlation in R. I have the two data sets that I want to compare and currently only one controlled variable. (This will change in the future)
I have looked online to try to work this out myself but it is difficult to understand the terminology used on the websites I have looked at. Can someone please explain how I would go about doing this and perhaps provide a simple example?
Data is in the following form:
Project.Name Bugs.Project Changes.Project Orgs.Project
1 platform_external_svox 4 161 2
3 platform_packages_apps_Nfc 13 223 2
5 platform_system_media 36 307 2
7 platform_external_mtpd 2 30 2
9 platform_bionic 42 1061 4
I want the correlation between Bugs.Project and Orgs.Project with Changes.Project as a controlled variable. I have downloaded the ppcor
library since it looks like it has the functionality that I need. I am unsure how to use it, however. How do I add my data to a matrix and use the pcor
function?
This is what I've been trying:
y.data <- data.frame(
bpp=c(projRelateBugsOrgs[2]),
opp=c(projRelateBugsOrgs[4]),
cpp=c(projRelateBugsOrgs[3])
)
test <- pcor(y.data)
I just used an example I found and tried to use my data in place of theirs. I don't understand my output.
It looks like this:
$estimate
Bugs.Project Orgs.Project Changes.Project
Bugs.Project 1.0000000 0.3935535 0.9749296
Orgs.Project 0.3935535 1.0000000 -0.1800788
Changes.Project 0.9749296 -0.1800788 1.0000000
$p.value
Bugs.Project Orgs.Project Changes.Project
Bugs.Project 0.00000e+00 2.09795e-07 0.0000000
Orgs.Project 2.09795e-07 0.00000e+00 0.0264442
Changes.Project 0.00000e+00 2.64442e-02 0.0000000
$statistic
Bugs.Project Orgs.Project Changes.Project
Bugs.Project 0.000000 5.190442 53.122165
Orgs.Project 5.190442 0.000000 -2.219625
Changes.Project 53.122165 -2.219625 0.000000
$n
[1] 150
$gp
[1] 1
$method
[1] "pearson"
I think I want something from the $estimate table but I'm not exactly sure what it's giving me,
Reading from help('pcor')
in the value section
Value
estimate a matrix of the partial correlation coefficient between two variables
p.value a matrix of the p value of the test
statistic a matrix of the value of the test statistic
n the number of samples
gn the number of given variables
method the correlation method used
The details section gives
Details
Partial correlation is the correlation of two variables while controlling for a third or more other variables.
For your result
$estimate
Bugs.Project Orgs.Project Changes.Project
Bugs.Project 1.0000000 0.3935535 0.9749296
Orgs.Project 0.3935535 1.0000000 -0.1800788
Changes.Project 0.9749296 -0.1800788 1.0000000
The partial correlation of Changes.Project
and Orgs.Project
is -0.1800788
. This is the correlation of Changes.Project
and Orgs.Project
controlling for Bugs.Project
The partial correlation of Changes.Project
and Bugs.Project
is 0.9747296
. This is the correlation of Changes.Project
and Bugs.Project
controlling for Orgs.Project
The partial correlation of Orgs.Project
and Bugs.Project
is 0.3935535
. This is the correlation of Orgs.Project
and Bugs.Project
controlling for Changes.Project
You could get same information (if you are only interested in this third case) from
pcor.test(y.data$Orgs.Project, y.data$Bugs.Project, y.data$Changes.Project)