How do I avoid double URL encoding when rendering URLs in my website?

Cory Kendall picture Cory Kendall · Apr 19, 2013 · Viewed 14.7k times · Source

Users provide both properly escaped URLs and raw URLs to my website in a text input; for example I consider these two URLs equivalent:

https://www.cool.com/cool%20beans
https://www.cool.com/cool beans

Now I want to render these as <a> tags later, when viewing this data. I am stuck between encoding the given text and getting these links:

<a href="https://www.cool.com/cool%2520beans">   <!-- This one is broken! -->
<a href="https://www.cool.com/cool%20beans">

Or not encoding it and getting this:

<a href="https://www.cool.com/cool%20beans">
<a href="https://www.cool.com/cool beans">       <!-- This one is broken! -->

What's the best way out from a user experience standpoint with modern browsers? I'm torn between doing a decoding pass over their input, or the second option I listed above where we don't encode the href attribute.

Answer

Chris Brown picture Chris Brown · Apr 19, 2013

If you want to avoid double encoding the links you can just use urldecode() on both links, and then urlencode() afterwards, as decoding a URL such as "https://www.cool.com/cool beans" would return the same value, whereas decoding "https://www.cool.com/cool%20beans" would return with the space. This leaves both links free to be encoded properly.

Alternatively, encoded characters could be scanned for using strpos() function, e.g.

if ($pos = strpos($url, "%20") {
    //Encoded character found
}

Ideally for this an array of common encoded characters would be scanned for, in the place of the "%20"