Regression loop and store coefficients

Chuan picture Chuan · Nov 24, 2015 · Viewed 7k times · Source

I am going (1) to loop a regression over a certain criterion many times; and (2) to store a certain coefficient from each regression. Here is an example:

clear
sysuse auto.dta
local x = 2000
while `x' < 5000 {
      xi: regress price mpg length gear_ratio i.foreign if weight < `x'
      est sto model_`x'
      local x = `x' + 100
}
est dir

I just care about one predictor, say mpg here. I want to extract coefficients of mpg from each result into one independent file (any file is OK, .dta would be great) to see if there is a trend as the threshold for weight increases. What I am doing now is to useestout to export the results, something like:

esttab * using test.rtf, replace se stats(r2_a N,  labels(R-squared)) starl(* 0.10 ** 0.05 *** 0.01) nogap onecell title(regression tables)

estout will export everything and I need to edit the results. This works well for regressions with few predictors, but my real dataset has more than 30 variables and the regression will loop at least 100 times (I have a variable Distance with range from 0 to 30,000: it has the role of weight in the example). Therefore, it is really difficult for me to edit the results without making mistakes.

Is there any other efficient way to solve my problem? Since my case is not looping over a group variable, but over a certain criterion. the statsby function seems not working well here.

Answer

Nick Cox picture Nick Cox · Nov 24, 2015

As @Todd has already suggested, you can just choose the particular results you care about and use postfile to store them as new variables in a new dataset. Note that a forval loop is more direct than your while code, while using xi: is superseded by factor variable notation in recent versions of Stata. (I have not changed that just in case you are using some older version.) Note evaluation of saved results such as _b[_cons] on the fly and the use of parentheses () to stop negative signs being evaluated. Some code examples elsewhere store results temporarily in local macros or scalars, which is quite unnecessary.

sysuse auto.dta, clear 
tempname myresults 
postfile `myresults' threshold intercept gradient se using myresults.dta 
quietly forval x = 2000(200)4800 {
    xi: regress price mpg length gear_ratio i.foreign if weight < `x'
    post `myresults' (`x') (`=_b[_cons]') (`=_b[mpg]') (`=_se[mpg]') 
}
postclose `myresults' 
use myresults 
list 

     +---------------------------------------------+
     | thresh~d   intercept    gradient         se |
     |---------------------------------------------|
  1. |     2000    -3699.55   -296.8218   215.0348 |
  2. |     2200   -4175.722   -53.19774   54.51251 |
  3. |     2400   -3918.388   -58.83933   42.19707 |
  4. |     2600   -6143.622   -58.20153   38.28178 |
  5. |     2800   -11159.67   -49.21381   44.82019 |
     |---------------------------------------------|
  6. |     3000   -6636.524   -51.28141   52.96473 |
  7. |     3200   -7410.392   -58.14692   60.55182 |
  8. |     3400   -2193.125   -57.89508   52.78178 |
  9. |     3600   -1824.281   -103.4387   56.49762 |
 10. |     3800   -1192.767   -110.9302    51.6335 |
     |---------------------------------------------|
 11. |     4000     5649.41   -173.9975   74.51212 |
 12. |     4200    5784.363   -147.4454   71.89362 |
 13. |     4400     6494.47   -93.81158   80.81586 |
 14. |     4600     6494.47   -93.81158   80.81586 |
 15. |     4800    5373.041   -95.25342   82.60246 |
     +---------------------------------------------+

statsby (a command, not a function) is just not designed for this problem at all, so it is not a question of whether it works well.