Using Stata Variable Labels in R

Jared picture Jared · Jan 28, 2010 · Viewed 14.2k times · Source

I have a bunch of Stata .dta files that I would like to use in R.

My problem is that the variable names are not helpful to me as they are like "q0100," "q0565," "q0500," and "q0202." However, they are labelled like "psu," "number of pregnant," "head of household," and "waypoint."

I would like to be able to grab the labels ("psu," "waypoint," etc. . .) and use them as my variable/column names as those will be easier for me to work with.

Is there a way to do this, either preferably in R, or through Stata itself? I know of read.dta in library(foreign) but don't know if it can convert the labels into variable names.

Answer

Ian Fellows picture Ian Fellows · Jan 28, 2010

R does not have a built in way to handle variable labels. Personally I think that this is disadvantage that should be fixed. Hmisc does provide some facilitiy for hadling variable labels, but the labels are only recognized by functions in that package. read.dta creates a data.frame with an attribute "var.labels" which contains the labeling information. You can then create a data dictionary from that.

> data(swiss)
> write.dta(swiss,swissfile <- tempfile())
> a <- read.dta(swissfile)
> 
> var.labels <- attr(a,"var.labels")
> 
> data.key <- data.frame(var.name=names(a),var.labels)
> data.key
          var.name       var.labels
1        Fertility        Fertility
2      Agriculture      Agriculture
3      Examination      Examination
4        Education        Education
5         Catholic         Catholic
6 Infant_Mortality Infant.Mortality

Of course this .dta file doesn't have very interesting labels, but yours should be more meaningful.