How to set a weighted least-squares in r for heteroscedastic data?

Lucas De Abreu Maia picture Lucas De Abreu Maia · Aug 15, 2013 · Viewed 14.2k times · Source

I'm running a regression on census data where my dependent variable is life expectancy and I have eight independent variables. The data is aggregated be cities, so I have many thousand observations.

My model is somewhat heteroscedastic though. I want to run a weighted least-squares where each observation is weighted by the city’s population. In this case, it would mean that I want to weight the observations by the inverse of the square root of the population. It’s unclear to me, however, what would be the best syntax. Currently, I have:

Model=lm(…,weights=(1/population))

Is that correct? Or should it be:

Model=lm(…,weights=(1/sqrt(population)))

(I found this question here: Weighted Least Squares - R but it does not clarify how R interprets the weights argument.)

Answer

Drew Steen picture Drew Steen · Aug 15, 2013

From ?lm: "weights: an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used." R doesn't do any further interpretation of the weights argument.

So, if what you want to minimize is the sum of (the squared distance from each point to the fit line * 1/sqrt(population) then you want ...weights=(1/sqrt(population)). If you want to minimize the sum of (the squared distance from each point to the fit line * 1/population) then you want ...weights=1/population.

As to which of those is most appropriate... that's a question for CrossValidated!