Escaping escape Characters

Alix Axel picture Alix Axel · May 20, 2010 · Viewed 26.6k times · Source

I'm trying to mimic the json_encode bitmask flags implemented in PHP 5.3.0, here is the string I have:

$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly

Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following:

"O\\\u0027Rei\\\u0022lly"

And I'm currently doing this in PHP versions older than 5.3.0:

str_replace(array('\\"', "\\'"), array('\\u0022', '\\\u0027'), json_encode($s))
or
str_replace(array('\\"', '\\\''), array('\\u0022', '\\\u0027'), json_encode($s))

Which correctly outputs the same result:

"O\\\u0027Rei\\\u0022lly"

I'm having trouble understanding why do I need to replace single quotes ('\\\'' or even "\\'" [surrounding quotes excluded]) with '\\\u0027' and not just '\\u0027'.


Here is the code that I'm having trouble porting to PHP < 5.3:

if (get_magic_quotes_gpc() && version_compare(PHP_VERSION, '6.0.0', '<'))
{
    /* JSON_HEX_APOS and JSON_HEX_QUOT are availiable */
    if (version_compare(PHP_VERSION, '5.3.0', '>=') === true)
    {
        $_GET = json_encode($_GET, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_POST = json_encode($_POST, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_COOKIE = json_encode($_COOKIE, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_REQUEST = json_encode($_REQUEST, JSON_HEX_APOS | JSON_HEX_QUOT);
    }

    /* mimic the behaviour of JSON_HEX_APOS and JSON_HEX_QUOT */
    else if (extension_loaded('json') === true)
    {
        $_GET = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_GET));
        $_POST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_POST));
        $_COOKIE = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_COOKIE));
        $_REQUEST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_REQUEST));
    }

    $_GET = json_decode(stripslashes($_GET));
    $_POST = json_decode(stripslashes($_POST));
    $_COOKIE = json_decode(stripslashes($_COOKIE));
    $_REQUEST = json_decode(stripslashes($_REQUEST));
}

Answer

awatts picture awatts · May 27, 2010

The PHP string

'O\'Rei"lly'

is just PHP's way of getting the literal value

O'Rei"lly

into a string which can be used. Calling addslashes on that string changes it to be literally the following 11 characters

O\'Rei\"lly

i.e. strlen(addslashes('O\'Rei"lly')) == 11

This is the value which is being sent to json_escape.

In JSON backslash is an escape character, so that needs to be escaped, i.e.

\ to be \\

Also single and double quotes can cause problems. So converting them to their unicode equivalent in one way to avoid problems. So later verions of PHP's json_encode change

' to be \u0027

and

" to be \u0022

So applying these three rules to

O\'Rei\"lly

gives us

O\\\u0027Rei\\\u0022lly

This string is then wrapped in double quotes to make it a JSON string. Your replace expressions include the leading forward slashes. Either by accident or on purpose this means that the leading and trailing double quote returned by json_encode is not subject to the escaping, which it shouldn't be.

So in earlier versions of PHP

$s = addslashes('O\'Rei"lly');
print json_encode($s);

would print

"O\\'Rei\\\"lly"

and we want to change ' to be \u0027 and we want to change \" to be \u0022 because the \ in \" is just to get the " into the string because it begins and ends with double-quotes.

So that's why we get

"O\\\u0027Rei\\\u0022lly"