urllib.quote() throws KeyError

Garfield picture Garfield · Feb 27, 2013 · Viewed 20k times · Source

To encode the URI, I used urllib.quote("schönefeld") but when some non-ascii characters exists in string, it thorws

KeyError: u'\xe9'
Code: return ''.join(map(quoter, s))

My input strings are köln, brønshøj, schönefeld etc.

When I tried just printing statements in windows(Using python2.7, pyscripter IDE). But in linux it raises exception (I guess platform doesn't matter).

This is what I am trying:

from commands import getstatusoutput
queryParams = "schönefeld";
cmdString = "http://baseurl" + quote(queryParams)
print getstatusoutput(cmdString)

Exploring the issue reason: in urllib.quote(), actually exception being throwin at return ''.join(map(quoter, s)).

The code in urllib is:

def quote(s, safe='/'):
    if not s:
        if s is None:
            raise TypeError('None object cannot be quoted')
        return s
     cachekey = (safe, always_safe)
     try:
         (quoter, safe) = _safe_quoters[cachekey]
     except KeyError:
         safe_map = _safe_map.copy()
         safe_map.update([(c, c) for c in safe])
         quoter = safe_map.__getitem__
         safe = always_safe + safe
         _safe_quoters[cachekey] = (quoter, safe)
      if not s.rstrip(safe):
         return s
      return ''.join(map(quoter, s))

The reason for exception is in ''.join(map(quoter, s)), for every element in s, quoter function will be called and finally the list will be joined by '' and returned.

For non-ascii char è, the equivalent key will be %E8 which presents in _safe_map variable. But when I am calling quote('è'), it searches for the key \xe8. So that the key does not exist and exception thrown.

So, I just modifed s = [el.upper().replace("\\X","%") for el in s] before calling ''.join(map(quoter, s)) within try-except block. Now it works fine.

But I am annoying what I have done is correct approach or it will create any other issue? And also I do have 200+ instances of linux which is very tough to deploy this fix in all instances.

Answer

Martijn Pieters picture Martijn Pieters · Feb 27, 2013

You are trying to quote Unicode data, so you need to decide how to turn that into URL-safe bytes.

Encode the string to bytes first. UTF-8 is often used:

>>> import urllib
>>> urllib.quote(u'sch\xe9nefeld')
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1268, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xe9'
>>> urllib.quote(u'sch\xe9nefeld'.encode('utf8'))
'sch%C3%A9nefeld'

However, the encoding depends on what the server will accept. It's best to stick to the encoding the original form was sent with.