Semicolon as URL query separator

mykhal picture mykhal · Aug 14, 2010 · Viewed 35.1k times · Source

Although it is strongly recommended (W3C source, via Wikipedia) for web servers to support semicolon as a separator of URL query items (in addition to ampersand), it does not seem to be generally followed.

For example, compare

        http://www.google.com/search?q=nemo&oe=utf-8

        http://www.google.com/search?q=nemo;oe=utf-8

results. (In the latter case, semicolon is, or was at the time of writing this text, treated as ordinary string character, as if the url was: http://www.google.com/search?q=nemo%3Boe=utf-8)

Although the first URL parsing library i tried, behaves well:

>>> from urlparse import urlparse, query_qs
>>> url = 'http://www.google.com/search?q=nemo;oe=utf-8'
>>> parse_qs(urlparse(url).query)
{'q': ['nemo'], 'oe': ['utf-8']}

What is the current status of accepting semicolon as a separator, and what are potential issues or some interesting notes? (from both server and client point of view)

Answer

geira picture geira · Nov 23, 2016

The W3C Recommendation from 1999 is obsolete. The current status, according to the 2014 W3C Recommendation, is that semicolon is now illegal as a parameter separator:

To decode application/x-www-form-urlencoded payloads, the following algorithm should be used. [...] The output of this algorithm is a sorted list of name-value pairs. [...]

  1. Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).

In other words, ?foo=bar;baz means the parameter foo will have the value bar;baz; whereas ?foo=bar;baz=sna should result in foo being bar;baz=sna (although technically illegal since the second = should be escaped to %3D).