Double clustered standard errors for panel data

Question 1

Double clustered standard errors for panel data

r regression standard-error panel-data plm

Alex · Dec 5, 2011 · Viewed 7.4k times · Source

Answer

Answer

This is an old question. But seeing as people still appear to be landing on it, I thought I'd provide some modern approaches to multiway clustering in R:

Option 1 (fastest): `fixest::feols()`

library(fixest)

nlswork = haven::read_dta("http://www.stata-press.com/data/r14/nlswork.dta")

est_feols = feols(ln_wage ~ age | race + year, cluster = ~race+year, data = nlswork)
est_feols

## An important feature of fixest: We can _instantaneously_ compute other
## VCOV matrices / SEs on the fly with summary.fixest(). No need to re-run
## the model!
summary(est_feols, se = 'standard') ## vanilla SEs
summary(est_feols, se = 'hetero') ## robust SEs
summary(est_feols, se = 'twoway') ## diff syntax, but same as original model
summary(est_feols, cluster = c('race', 'year')) ## ditto
summary(est_feols, cluster = ~race^year) ## interacted cluster vars
summary(est_feols, cluster = ~ race + year + idcode)  ## add third cluster var (not in original model call)
## etc.

Option 2 (fast): `lfe::felm()`

library(lfe)

## Note the third "| 0 " slot means we're not using IV

est_felm = felm(ln_wage ~ age | race + year | 0 | race + year, data = nlswork)
summary(est_felm)

Option 3 (slower, but flexible): `sandwich`

library(sandwich)
library(lmtest)

est_sandwich = lm(ln_wage ~ age + factor(race) + factor(year), data = nlswork) 
coeftest(est_sandwich, vcov = vcovCL, cluster = ~ race + year)

Benchmark

Aaaand, just to belabour the point about speed. Here's a benchmark of the three different approaches (using two fixed FEs and twoway clustering).

est_feols = function() feols(ln_wage ~ age | race + year, cluster = ~race+year, data = nlswork) 
est_felm = function() felm(ln_wage ~ age | race + year | 0 | race + year, data = nlswork)
est_standwich = function() {coeftest(lm(ln_wage ~ age + factor(race) + factor(year), data = nlswork), 
                                     vcov = vcovCL, cluster = ~ race + year)}

microbenchmark(est_feols(), est_felm(), est_standwich(), times = 3)

#> Unit: milliseconds
#>             expr       min        lq      mean    median        uq       max neval cld
#>      est_feols()  11.94122  11.96158  12.55835  11.98193  12.86692  13.75191     3 a  
#>       est_felm()  87.18064  95.89905 100.69589 104.61746 107.45352 110.28957     3  b 
#>  est_standwich() 176.43502 183.95964 188.50271 191.48425 194.53656 197.58886     3   c

Question 2

I have a panel data set in R (time and cross section) and would like to compute standard errors that are clustered by two dimensions, because my residuals are correlated both ways. Googling around I found http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ which provides a function to do this. It seems a bit ad-hoc so I wanted to know if there is a package that has been tested and does this?

I know sandwich does HAC standard errors, but it doesn't do double clustering (i.e. along two dimensions).

Double clustered standard errors for panel data

Answer

Option 1 (fastest): fixest::feols()

Option 2 (fast): lfe::felm()

Option 3 (slower, but flexible): sandwich

Benchmark

Related questions

Option 1 (fastest): `fixest::feols()`

Option 2 (fast): `lfe::felm()`

Option 3 (slower, but flexible): `sandwich`