"<" character in JSON data is serialized to \u003c

gnychis picture gnychis · Feb 6, 2014 · Viewed 8.4k times · Source

I have a JSON object where the value of one element is a string. In this string there are the characters "<RPC>". I take this entire JSON object and in my ASP.NET server code, I perform the following to take the object named rpc_response and add it to the data in a POST response:

var serializer = new System.Web.Script.Serialization.JavaScriptSerializer();
HttpContext.Current.Response.AddHeader("Pragma", "no-cache");
HttpContext.Current.Response.AddHeader("Cache-Control", "private, no-cache");
HttpContext.Current.Response.AddHeader("Content-Disposition", "inline; filename=\"files.json\"");
HttpContext.Current.Response.Write(serializer.Serialize(rpc_response));
HttpContext.Current.Response.ContentType = "application/json";
HttpContext.Current.Response.StatusCode = 200;

After the object is serialized, I receive it on the other end (not a web browser), and that particular string looks like: \u003cRPC\u003e.

What can I do to prevent these (and other) characters from not being encoded properly, still being able to serialize my JSON object?

Answer

user2864740 picture user2864740 · Feb 6, 2014

The characters are being encoded "properly"!1 Use a working JSON library to correctly access the JSON data - it is a valid JSON encoding.

Escaping these characters prevents HTML injection via JSON - and makes the JSON XML-friendly. That is, even if the JSON is emited directly into JavaScript (as is done fairly often as JSON is a valid2 subset of JavaScript), it cannot be used to terminate the <script> element early because the relevant characters (e.g. <, >) are encoded within JSON itself.

The standard JavaScriptSerializer does not have the ability to change this behavior. Such escaping might be configurable (or different) in the Json.NET implementation - but, it shouldn't matter because a valid JSON client/library must understand the \u escapes.


1 From RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON),

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point ..

See also C# To transform Facebook Response to proper encoded string (which is also related to the JSON escaping).

2 There is a rare case when this does not hold, but ignoring (or accounting for) that..