If I want to create a URL using a variable I have two choices to encode the string. urlencode()
and rawurlencode()
.
What exactly are the differences and which is preferred?
It will depend on your purpose. If interoperability with other systems is important then it seems rawurlencode is the way to go. The one exception is legacy systems which expect the query string to follow form-encoding style of spaces encoded as + instead of %20 (in which case you need urlencode).
rawurlencode follows RFC 1738 prior to PHP 5.3.0 and RFC 3986 afterwards (see http://us2.php.net/manual/en/function.rawurlencode.php)
Returns a string in which all non-alphanumeric characters except -_.~ have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in » RFC 3986 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URLs from being mangled by transmission media with character conversions (like some email systems).
Note on RFC 3986 vs 1738. rawurlencode prior to php 5.3 encoded the tilde character (~
) according to RFC 1738. As of PHP 5.3, however, rawurlencode follows RFC 3986 which does not require encoding tilde characters.
urlencode encodes spaces as plus signs (not as %20
as done in rawurlencode)(see http://us2.php.net/manual/en/function.urlencode.php)
Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.
This corresponds to the definition for application/x-www-form-urlencoded in RFC 1866.
Additional Reading:
You may also want to see the discussion at http://bytes.com/groups/php/5624-urlencode-vs-rawurlencode.
Also, RFC 2396 is worth a look. RFC 2396 defines valid URI syntax. The main part we're interested in is from 3.4 Query Component:
Within a query component, the characters
";", "/", "?", ":", "@",
are reserved.
"&", "=", "+", ",", and "$"
As you can see, the +
is a reserved character in the query string and thus would need to be encoded as per RFC 3986 (as in rawurlencode).