I am trying to do an F-test on the joint significance of fixed effects (individual-specific dummy variables) on a panel data OLS regression (in R), however I haven't found a way to accomplish this for a large number of fixed effects. Ideally, I would use a function in the plm
package, however I haven't found anything that specifically does this test.
This is something Stata does automatically when using the xtreg, fe
command. In Stata, the results looks like this:
------------------------------------------------------------------------------
F test that all u_i=0: F(49, 498) = 12.00 Prob > F = 0.000
Again, I am trying to reproduce the Stata result in R for a large number of dummy variables, perhaps specified by + factor(us.state)
using lm()
or model = "fe"
using plm()
.
Here is a reproducible example:
require(foreign)
voter <- read.dta("http://www.montana.edu/econ/cstoddard/562/panel_hw.dta")
reg1 <- lm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border
+ factor(state), data=voter)
which is equivalent to the following "within" regression using the plm
package.
require(plm)
reg1.fe <- plm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border,
data=voter, index = c("state","year"), model = "within")
So, the test would be the test that all the state dummy variables are jointly different from zero (jointly significant). This is a linear restriction on the unrestricted model (reg1 and reg1.fe above). This F-test is better explained on the following document (see slides 5-7).
http://jackman.stanford.edu/classes/350B/07/ftestforWeb.pdf
Here is one of my feeble attempts at creating an 'R' matrix for the F-test with null hypothesis: Rb = q where b is the matrix of coefficients (beta hat), and q is a vector of zeros.
d1 = length(unique(voter$stcode))-1
d2 = length(reg1$coefficients)
R = cbind(matrix(0,d1,d2),diag(d1))
linearHypothesis(reg1,R,rhs=0)
This doesn't work! And, I'm hoping there is a streamlined approach to testing for joint significance of all fixed effect dummy variables.
First, I'd like to suggest that your question could be improved by (1) providing a reproducible example, and (2) describing the precise test to which you refer to when you say 'F test'. A link to the Stata docs maybe? F is the distribution, so there can be a gazillion tests called an 'F test'.
If your substantive interest lies in determining whether the fixed effects model fits the data significantly better than OLS without fixed effects, then you could always use a likelihood ratio test. I'm sure there are many implementations in R, but the one provided by the lmtest
package is pretty convenient. Here's an example using a dataset distributed with the plm
package (you seem to have that installed, so it should be easy to try).
library(plm)
data(Produc)
library(lmtest)
mod <- lm(pcap ~ hwy + water, Produc)
mod.fe <- lm(pcap ~ hwy + water + factor(state), Produc)
lrtest(mod, mod.fe)
and the output:
Likelihood ratio test
Model 1: pcap ~ hwy + water
Model 2: pcap ~ hwy + water + factor(state)
#Df LogLik Df Chisq Pr(>Chisq)
1 4 -8038.1
2 51 -6712.4 47 2651.4 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
EDIT: OPTION 2
require(foreign)
voter <- read.dta("http://www.montana.edu/econ/cstoddard/562/panel_hw.dta")
reg1 <- lm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border
+ factor(state), data=voter)
library(plm)
reg1.fe <- plm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border,
data=voter, index = c("state","year"), model = "within")
reg1.pooling <- plm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border,
data=voter, index = c("state","year"), model = "pooling")
pFtest(reg1.fe, reg1.pooling)
OUTPUT:
F test for individual effects
data: vaprate ~ gsp + midterm + regdead + WNCentral + South + Border
F = 13.0712, df1 = 45, df2 = 498, p-value < 2.2e-16
alternative hypothesis: significant effects