I would like to estimate covariate effects on a response whose values take on values in [0,1]. That is, the values of the response variable live between 0-1 (inclusive). I would like to use the fractional logit model described by Papke and Wooldridge (1996), see below:
http://faculty.smu.edu/millimet/classes/eco6375/papers/papke%20wooldridge%201996.pdf
Is there an R function (or library) to facilitate estimation of the fractional logit model? Could I modify glm()
in some way?
I appreciate @Jibler's comment - this gets at the estimated beta's from the fractional logit model fine. However, as @Ben pointed out, the SE's won't be correctly estimated given this specification.
I suppose this is a more popular model in economics, hence is well discussed by STATA journal contributors: http://fmwww.bc.edu/EC-C/S2013/823/EC823.S2013.nn06.slides.pdf http://www.stata.com/meeting/germany10/germany10_buis.pdf
I was able to obtain the data from the Papke and Wooldridge 401k plan example (see below). It appears to me at least that the robustness in the fractional logit model is obtained by the sandwich estimator of variance - equation (9) of Papke and Wooldridge. That said, equation (10) goes on to demonstrate how robustness may also be obtained by pre-multiplying the estimated vcov
matrix from a standard glm(...,family=binomial(link=logit))
fit by an estimate of the Pearson residuals.
The slides by Buis seem to implement a sandwich()
form of the fractional logit estimator using the argument vce(robust). These align exactly with the application of the sandwich()
function in R, to the standard binomial GLM. I assume, but am not sure, as I'm not a STATA wiz, that this is the same as Baum's argument to simply robust
? If anyone owns STATA and could check that would be helpful. The model given by the family=quasibinomial
GLM gives very slightly different SE estimates. But it too seems to be a reasonable estimator of both the mean/variance parameters of the fractional logit model.
Below is some R code which replicates the data fit given in the Buis article above (it also shows how the quasi-binomial model gives slightly different SE estimates):
##
## Replicate what some STATA Journal editors call "fractional logit"
## get data from: "http://fmwww.bc.edu/repec/bocode/k/k401.dta"
##
library(sandwich)
library(foreign)
X <- read.dta("F:/ProportionsDepVar/k401.dta")
class(X)
names(X)
dim(X)
X$totemp1 <- X$totemp/10000
glmfit <- glm(prate ~ mrate + totemp1 + age + sole, family=binomial(link=logit), data=X)
summary(glmfit)
##
## And the SE's are off here and biased large
## Use sandwich estimator instead
##
sand_vcov <- sandwich(glmfit)
sand_se <- sqrt(diag(sand_vcov))
robust_z <- glmfit$coef/sand_se
robust_z
##
## Quasi binomial fit is close to replicating SE's
##
flogit1 <- glm(prate ~ mrate + totemp1 + age + sole, family=quasibinomial(link=logit), data=X)
summary(flogit1)
So...thanks @Ben for useful suggestions. My take is that either family=quasibinomial
or sandwich
library does a good job at estimating robust SE's for fractional logit model in R (as defined by equations (9) or (10) of Papke and Wooldridge). Appreciate comments/criticisms if this conclusion is not true.