I'm attempting to read the following JSON file ("my_file.json") into R, which contains the following:
[{"id":"484","comment":"They call me "Bruce""}]
using the jsonlite package (0.9.12), the following fails:
library(jsonlite)
fromJSON(readLines('~/my_file.json'))
receiving an error:
"Error in parseJSON(txt) : lexical error: invalid char in json text.
84","comment":"They call me "Bruce""}]
(right here) ------^"
Here is the output from R escaping of the file:
readLines('~/my_file.json')
"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"
Removing the quotes around "Bruce" solves the problem, as in:
my_file.json
[{"id":"484","comment":"They call me Bruce"}]
But what is the issue with the escapement?
In R strings literals can be defined using single or double quotes.
e.g.
s1 <- 'hello'
s2 <- "world"
Of course, if you want to include double quotes inside a string literal defined using double quotes you need to escape (using backslash) the inner quotes, otherwise the R code parser won't be able to detect the end of the string correctly (the same holds for single quote).
e.g.
s1 <- "Hello, my name is \"John\""
If you print (using cat
¹) this string on the console, or you write this string on a file you will get the actual "face" of the string, not the R literal representation, that is :
> cat("Hello, my name is \"John\"")
Hello, my name is "John"
The json parser, reads the actual "face" of the string, so, in your case json reads :
[{"id":"484","comment":"They call me "Bruce""}]
not (the R literal representation) :
"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"
That being said, also the json parser needs double-quotes escaping when you have quotes inside strings.
Hence, your string should be modified in this way :
[{"id":"484","comment":"They call me \"Bruce\""}]
If you simply modify your file by adding the backslashes you will be perfectly able to read the json.
Note that the corresponding R literal representation of that string would be :
"[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]"
in fact, this works :
> fromJSON("[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]")
id comment
1 484 They call me "Bruce"
¹
the default R print
function (invoked also when you simply press ENTER on a value) returns the corresponding R string literal. If you want to print the actual string, you need to use print(quote=F,stringToPrint)
, or cat
function.
EDIT (on @EngrStudent comment on the possibility to automatize quotes escaping) :
Json parser cannot do quotes escaping automatically.
I mean, try to put yourself in the computer's shoes and image you should parse this (unescaped) string as json: { "foo1" : " : "foo2" : "foo3" }
I see at least three possible escaping giving a valid json:
{ "foo1" : " : \"foo2\" : \"foo3" }
{ "foo1\" : " : "foo2\" : \"foo3" }
{ "foo1\" : \" : \"foo2" : "foo3" }
As you can see from this small example, escaping is really necessary to avoid ambiguities.
Maybe, if the string you want to escape has a really particular structure where you can recognize (without uncertainty) the double-quotes needing to be escaped, you can create your own automatic escaping procedure, but you need to start from scratch, because there's nothing built-in.