When creating JSON data manually, how should I escape string fields? Should I use something like Apache Commons Lang's StringEscapeUtilities.escapeHtml
, StringEscapeUtilities.escapeXml
, or should I use java.net.URLEncoder
?
The problem is that when I use SEU.escapeHtml
, it doesn't escape quotes and when I wrap the whole string in a pair of '
s, a malformed JSON will be generated.
Ideally, find a JSON library in your language that you can feed some appropriate data structure to, and let it worry about how to escape things. It'll keep you much saner. If for whatever reason you don't have a library in your language, you don't want to use one (I wouldn't suggest this¹), or you're writing a JSON library, read on.
Escape it according to the RFC. JSON is pretty liberal: The only characters you must escape are \
, "
, and control codes (anything less than U+0020).
This structure of escaping is specific to JSON. You'll need a JSON specific function. All of the escapes can be written as \uXXXX
where XXXX
is the UTF-16 code unit¹ for that character. There are a few shortcuts, such as \\
, which work as well. (And they result in a smaller and clearer output.)
For full details, see the RFC.
¹JSON's escaping is built on JS, so it uses \uXXXX
, where XXXX
is a UTF-16 code unit. For code points outside the BMP, this means encoding surrogate pairs, which can get a bit hairy. (Or, you can just output the character directly, since JSON's encoded for is Unicode text, and allows these particular characters.)