I have a bunch of Stata .dta files that I would like to use in R.
My problem is that the variable names are not helpful to me as they are like "q0100," "q0565," "q0500," and "q0202." However, they are labelled like "psu," "number of pregnant," "head of household," and "waypoint."
I would like to be able to grab the labels ("psu," "waypoint," etc. . .) and use them as my variable/column names as those will be easier for me to work with.
Is there a way to do this, either preferably in R, or through Stata itself? I know of read.dta in library(foreign) but don't know if it can convert the labels into variable names.
R does not have a built in way to handle variable labels. Personally I think that this is disadvantage that should be fixed. Hmisc does provide some facilitiy for hadling variable labels, but the labels are only recognized by functions in that package. read.dta creates a data.frame with an attribute "var.labels" which contains the labeling information. You can then create a data dictionary from that.
> data(swiss)
> write.dta(swiss,swissfile <- tempfile())
> a <- read.dta(swissfile)
>
> var.labels <- attr(a,"var.labels")
>
> data.key <- data.frame(var.name=names(a),var.labels)
> data.key
var.name var.labels
1 Fertility Fertility
2 Agriculture Agriculture
3 Examination Examination
4 Education Education
5 Catholic Catholic
6 Infant_Mortality Infant.Mortality
Of course this .dta file doesn't have very interesting labels, but yours should be more meaningful.