Importing JSON into R with in-line quotation marks

WMC picture WMC · Oct 7, 2014 · Viewed 8.6k times · Source

I'm attempting to read the following JSON file ("my_file.json") into R, which contains the following:

[{"id":"484","comment":"They call me "Bruce""}]

using the jsonlite package (0.9.12), the following fails:

library(jsonlite)
fromJSON(readLines('~/my_file.json'))

receiving an error:

"Error in parseJSON(txt) : lexical error: invalid char in json text.
84","comment":"They call me "Bruce""}]
           (right here) ------^"

Here is the output from R escaping of the file:

readLines('~/my_file.json')

"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"

Removing the quotes around "Bruce" solves the problem, as in:

my_file.json

[{"id":"484","comment":"They call me Bruce"}]

But what is the issue with the escapement?

Answer

digEmAll picture digEmAll · Oct 7, 2014

In R strings literals can be defined using single or double quotes.
e.g.

s1 <- 'hello'
s2 <- "world"

Of course, if you want to include double quotes inside a string literal defined using double quotes you need to escape (using backslash) the inner quotes, otherwise the R code parser won't be able to detect the end of the string correctly (the same holds for single quote).
e.g.

s1 <- "Hello, my name is \"John\""

If you print (using cat¹) this string on the console, or you write this string on a file you will get the actual "face" of the string, not the R literal representation, that is :

> cat("Hello, my name is \"John\"")
Hello, my name is "John"

The json parser, reads the actual "face" of the string, so, in your case json reads :

[{"id":"484","comment":"They call me "Bruce""}]

not (the R literal representation) :

"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]" 

That being said, also the json parser needs double-quotes escaping when you have quotes inside strings.

Hence, your string should be modified in this way :

[{"id":"484","comment":"They call me \"Bruce\""}]

If you simply modify your file by adding the backslashes you will be perfectly able to read the json.

Note that the corresponding R literal representation of that string would be :

"[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]"

in fact, this works :

> fromJSON("[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]")
   id              comment
1 484 They call me "Bruce"

¹ the default R print function (invoked also when you simply press ENTER on a value) returns the corresponding R string literal. If you want to print the actual string, you need to use print(quote=F,stringToPrint), or cat function.


EDIT (on @EngrStudent comment on the possibility to automatize quotes escaping) :

Json parser cannot do quotes escaping automatically.
I mean, try to put yourself in the computer's shoes and image you should parse this (unescaped) string as json: { "foo1" : " : "foo2" : "foo3" }

I see at least three possible escaping giving a valid json:
{ "foo1" : " : \"foo2\" : \"foo3" }
{ "foo1\" : " : "foo2\" : \"foo3" }
{ "foo1\" : \" : \"foo2" : "foo3" }

As you can see from this small example, escaping is really necessary to avoid ambiguities.

Maybe, if the string you want to escape has a really particular structure where you can recognize (without uncertainty) the double-quotes needing to be escaped, you can create your own automatic escaping procedure, but you need to start from scratch, because there's nothing built-in.