how to read text files and create a data frame in R

Sheldon picture Sheldon · Oct 28, 2015 · Viewed 25.1k times · Source

Need to read the txt file in https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt

and convert them into a data frame R with column number as: LastName, FirstName, streetno, streetname, city, state, and zip...

Tried to use sep command to separate them but failed...

Answer

eipi10 picture eipi10 · Oct 29, 2015

Expanding on my comments, here's another approach. You may need to tweak some of the code if your full data set has a wider range of patterns to account for.

library(stringr) # For str_trim 

# Read string data and split into data frame
dat = readLines("addr.txt")
dat = as.data.frame(do.call(rbind, strsplit(dat, split=" {2,10}")), stringsAsFactors=FALSE)
names(dat) = c("LastName", "FirstName", "address", "city", "state", "zip")

# Separate address into number and street (if streetno isn't always numeric,
# or if you don't want it to be numeric, then just remove the as.numeric wrapper).
dat$streetno = as.numeric(gsub("([0-9]{1,4}).*","\\1",  dat$address))
dat$streetname = gsub("[0-9]{1,4} (.*)","\\1",  dat$address)

# Clean up zip
dat$zip = gsub("O","0", dat$zip)
dat$zip = str_trim(dat$zip)

dat = dat[,c(1:2,7:8,4:6)]

dat
      LastName  FirstName streetno           streetname       city state        zip
1        Bania  Thomas M.      725    Commonwealth Ave.     Boston    MA      02215
2      Barnaby      David      373        W. Geneva St.   Wms. Bay    WI      53191
3       Bausch       Judy      373        W. Geneva St.   Wms. Bay    WI      53191
...
41      Wright       Greg      791  Holmdel-Keyport Rd.    Holmdel    NY 07733-1988
42     Zingale    Michael     5640        S. Ellis Ave.    Chicago    IL      60637