How to read \" double-quote escaped values with read.table in R

Alexandre Rademaker picture Alexandre Rademaker · Aug 15, 2011 · Viewed 8.3k times · Source

I am having trouble to read a file containing lines like the one below in R.

"_:b5507F4C7x59005","Fabiana D\"atri"

Any idea? How can I make read.table understand that \" is the escape of quote?

Cheers, Alexandre

Answer

Tommy picture Tommy · Aug 15, 2011

It seems to me that read.table/read.csv cannot handle escaped quotes.

...But I think I have an (ugly) work-around inspired by @nullglob;

  • First read the file WITHOUT a quote character. (This won't handle embedded , as @Ben Bolker noted)
  • Then go though the string columns and remove the quotes:

The test file looks like this (I added a non-string column for good measure):

13,"foo","Fab D\"atri","bar"
21,"foo2","Fab D\"atri2","bar2"

And here is the code:

# Generate test file
writeLines(c("13,\"foo\",\"Fab D\\\"atri\",\"bar\"",
             "21,\"foo2\",\"Fab D\\\"atri2\",\"bar2\"" ), "foo.txt")

# Read ignoring quotes
tbl <- read.table("foo.txt", as.is=TRUE, quote='', sep=',', header=FALSE, row.names=NULL)

# Go through and cleanup    
for (i in seq_len(NCOL(tbl))) {
    if (is.character(tbl[[i]])) {
        x <- tbl[[i]]
        x <- substr(x, 2, nchar(x)-1) # Remove surrounding quotes
        tbl[[i]] <- gsub('\\\\"', '"', x) # Unescape quotes
    }
}

The output is then correct:

> tbl
  V1   V2          V3   V4
1 13  foo  Fab D"atri  bar
2 21 foo2 Fab D"atri2 bar2