F-test on Fixed Effects in R (Panel Data)

baha-kev picture baha-kev · May 30, 2011 · Viewed 14.2k times · Source

I am trying to do an F-test on the joint significance of fixed effects (individual-specific dummy variables) on a panel data OLS regression (in R), however I haven't found a way to accomplish this for a large number of fixed effects. Ideally, I would use a function in the plm package, however I haven't found anything that specifically does this test.

This is something Stata does automatically when using the xtreg, fe command. In Stata, the results looks like this:

------------------------------------------------------------------------------
F test that all u_i=0:  F(49, 498) =    12.00   Prob > F = 0.000

Again, I am trying to reproduce the Stata result in R for a large number of dummy variables, perhaps specified by + factor(us.state) using lm() or model = "fe" using plm().

Here is a reproducible example:

require(foreign)

voter <- read.dta("http://www.montana.edu/econ/cstoddard/562/panel_hw.dta")

reg1 <- lm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border
               + factor(state), data=voter)

which is equivalent to the following "within" regression using the plm package.

require(plm)

reg1.fe <- plm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border,
data=voter, index = c("state","year"), model = "within")

So, the test would be the test that all the state dummy variables are jointly different from zero (jointly significant). This is a linear restriction on the unrestricted model (reg1 and reg1.fe above). This F-test is better explained on the following document (see slides 5-7).

http://jackman.stanford.edu/classes/350B/07/ftestforWeb.pdf

Here is one of my feeble attempts at creating an 'R' matrix for the F-test with null hypothesis: Rb = q where b is the matrix of coefficients (beta hat), and q is a vector of zeros.

d1 = length(unique(voter$stcode))-1
d2 = length(reg1$coefficients)
R = cbind(matrix(0,d1,d2),diag(d1))

linearHypothesis(reg1,R,rhs=0)

This doesn't work! And, I'm hoping there is a streamlined approach to testing for joint significance of all fixed effect dummy variables.

Answer

Vincent picture Vincent · May 30, 2011

First, I'd like to suggest that your question could be improved by (1) providing a reproducible example, and (2) describing the precise test to which you refer to when you say 'F test'. A link to the Stata docs maybe? F is the distribution, so there can be a gazillion tests called an 'F test'.

If your substantive interest lies in determining whether the fixed effects model fits the data significantly better than OLS without fixed effects, then you could always use a likelihood ratio test. I'm sure there are many implementations in R, but the one provided by the lmtest package is pretty convenient. Here's an example using a dataset distributed with the plm package (you seem to have that installed, so it should be easy to try).

library(plm)
data(Produc)

library(lmtest)
mod <- lm(pcap ~ hwy + water, Produc)
mod.fe <- lm(pcap ~ hwy + water + factor(state), Produc)
lrtest(mod, mod.fe)  

and the output:

Likelihood ratio test

Model 1: pcap ~ hwy + water
Model 2: pcap ~ hwy + water + factor(state)
  #Df  LogLik Df  Chisq Pr(>Chisq)    
1   4 -8038.1                         
2  51 -6712.4 47 2651.4  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

EDIT: OPTION 2

require(foreign)
voter <- read.dta("http://www.montana.edu/econ/cstoddard/562/panel_hw.dta")
reg1 <- lm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border
               + factor(state), data=voter)

library(plm)
reg1.fe <- plm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border,
data=voter, index = c("state","year"), model = "within")

reg1.pooling <- plm(vaprate ~ gsp + midterm + regdead + WNCentral + South + Border,
data=voter, index = c("state","year"), model = "pooling")

pFtest(reg1.fe, reg1.pooling)

OUTPUT:

    F test for individual effects

data:  vaprate ~ gsp + midterm + regdead + WNCentral + South + Border 
F = 13.0712, df1 = 45, df2 = 498, p-value < 2.2e-16
alternative hypothesis: significant effects