Run Stata do file from Python

svenkatesh picture svenkatesh · Jan 21, 2014 · Viewed 9.7k times · Source

I have a Python script that cleans up and performs basic statistical calculations on a large panel dataset (2,000,000+ observations).

I find that some of these tasks are better suited to Stata, and wrote a do file with the necessary commands. Thus, I want to run a .do file within my Python code. How would I go about calling a .do file from Python?

Answer

Roberto Ferrer picture Roberto Ferrer · Jan 22, 2014

I think @user229552 points in the correct direction. Python's subprocess module can be used. Below an example that works for me with Linux OS.

Suppose you have a Python file called pydo.py with the following:

import subprocess

## Do some processing in Python

## Set do-file information
dofile = "/home/roberto/Desktop/pyexample3.do"
cmd = ["stata", "do", dofile, "mpg", "weight", "foreign"]

## Run do-file
subprocess.call(cmd) 

and a Stata do-file named pyexample3.do, with the following:

clear all
set more off

local y `1'
local x1 `2'
local x2 `3'

display `"first parameter: `y'"'
display `"second parameter: `x1'"'
display `"third parameter: `x2'"'

sysuse auto
regress `y' `x1' `x2'

exit, STATA clear

Then executing pydo.py in a Terminal window works as expected.

You could also define a Python function and use that:

## Define a Python function to launch a do-file 
def dostata(dofile, *params):
    ## Launch a do-file, given the fullpath to the do-file
    ## and a list of parameters.
    import subprocess    
    cmd = ["stata", "do", dofile]
    for param in params:
        cmd.append(param)
    return subprocess.call(cmd) 

## Do some processing in Python

## Run a do-file
dostata("/home/roberto/Desktop/pyexample3.do", "mpg", "weight", "foreign")

The complete call from a Terminal, with results:

roberto@roberto-mint ~/Desktop
$ python pydo.py

  ___  ____  ____  ____  ____ (R)
 /__    /   ____/   /   ____/
___/   /   /___/   /   /___/   12.1   Copyright 1985-2011 StataCorp LP
  Statistics/Data Analysis            StataCorp
                                      4905 Lakeway Drive
                                      College Station, Texas 77845 USA
                                      800-STATA-PC        http://www.stata.com
                                      979-696-4600        [email protected]
                                      979-696-4601 (fax)


Notes:
      1.  Command line editing enabled

. do /home/roberto/Desktop/pyexample3.do mpg weight foreign 

. clear all

. set more off

. 
. local y `1'

. local x1 `2'

. local x2 `3'

. 
. display `"first parameter: `y'"'
first parameter: mpg

. display `"second parameter: `x1'"'
second parameter: weight

. display `"third parameter: `x2'"'
third parameter: foreign

. 
. sysuse auto
(1978 Automobile Data)

. regress `y' `x1' `x2'

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  2,    71) =   69.75
       Model |   1619.2877     2  809.643849           Prob > F      =  0.0000
    Residual |  824.171761    71   11.608053           R-squared     =  0.6627
-------------+------------------------------           Adj R-squared =  0.6532
       Total |  2443.45946    73  33.4720474           Root MSE      =  3.4071

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0065879   .0006371   -10.34   0.000    -.0078583   -.0053175
     foreign |  -1.650029   1.075994    -1.53   0.130      -3.7955    .4954422
       _cons |    41.6797   2.165547    19.25   0.000     37.36172    45.99768
------------------------------------------------------------------------------

. 
. exit, STATA clear

Sources:

http://www.reddmetrics.com/2011/07/15/calling-stata-from-python.html

http://docs.python.org/2/library/subprocess.html

http://www.stata.com/support/faqs/unix/batch-mode/

A different route for using Python and Stata together can be found at

http://ideas.repec.org/c/boc/bocode/s457688.html

http://www.stata.com/statalist/archive/2013-08/msg01304.html