I'm trying to mimic the json_encode
bitmask flags implemented in PHP 5.3.0, here is the string I have:
$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly
Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT)
outputs the following:
"O\\\u0027Rei\\\u0022lly"
And I'm currently doing this in PHP versions older than 5.3.0:
str_replace(array('\\"', "\\'"), array('\\u0022', '\\\u0027'), json_encode($s))
or
str_replace(array('\\"', '\\\''), array('\\u0022', '\\\u0027'), json_encode($s))
Which correctly outputs the same result:
"O\\\u0027Rei\\\u0022lly"
I'm having trouble understanding why do I need to replace single quotes ('\\\''
or even "\\'"
[surrounding quotes excluded]) with '\\\u0027'
and not just '\\u0027'
.
Here is the code that I'm having trouble porting to PHP < 5.3:
if (get_magic_quotes_gpc() && version_compare(PHP_VERSION, '6.0.0', '<'))
{
/* JSON_HEX_APOS and JSON_HEX_QUOT are availiable */
if (version_compare(PHP_VERSION, '5.3.0', '>=') === true)
{
$_GET = json_encode($_GET, JSON_HEX_APOS | JSON_HEX_QUOT);
$_POST = json_encode($_POST, JSON_HEX_APOS | JSON_HEX_QUOT);
$_COOKIE = json_encode($_COOKIE, JSON_HEX_APOS | JSON_HEX_QUOT);
$_REQUEST = json_encode($_REQUEST, JSON_HEX_APOS | JSON_HEX_QUOT);
}
/* mimic the behaviour of JSON_HEX_APOS and JSON_HEX_QUOT */
else if (extension_loaded('json') === true)
{
$_GET = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_GET));
$_POST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_POST));
$_COOKIE = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_COOKIE));
$_REQUEST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_REQUEST));
}
$_GET = json_decode(stripslashes($_GET));
$_POST = json_decode(stripslashes($_POST));
$_COOKIE = json_decode(stripslashes($_COOKIE));
$_REQUEST = json_decode(stripslashes($_REQUEST));
}
The PHP string
'O\'Rei"lly'
is just PHP's way of getting the literal value
O'Rei"lly
into a string which can be used. Calling addslashes
on that string changes it to be literally the following 11 characters
O\'Rei\"lly
i.e. strlen(addslashes('O\'Rei"lly')) == 11
This is the value which is being sent to json_escape
.
In JSON backslash is an escape character, so that needs to be escaped, i.e.
\
to be \\
Also single and double quotes can cause problems. So converting them to their unicode equivalent in one way to avoid problems. So later verions of PHP's json_encode change
'
to be \u0027
and
"
to be \u0022
So applying these three rules to
O\'Rei\"lly
gives us
O\\\u0027Rei\\\u0022lly
This string is then wrapped in double quotes to make it a JSON string. Your replace expressions include the leading forward slashes. Either by accident or on purpose this means that the leading and trailing double quote returned by json_encode
is not subject to the escaping, which it shouldn't be.
So in earlier versions of PHP
$s = addslashes('O\'Rei"lly');
print json_encode($s);
would print
"O\\'Rei\\\"lly"
and we want to change '
to be \u0027
and we want to change \"
to be \u0022
because the \
in \"
is just to get the "
into the string because it begins and ends with double-quotes.
So that's why we get
"O\\\u0027Rei\\\u0022lly"