To encode the URI, I used urllib.quote("schönefeld")
but when some non-ascii characters exists in string, it thorws
KeyError: u'\xe9'
Code: return ''.join(map(quoter, s))
My input strings are köln, brønshøj, schönefeld
etc.
When I tried just printing statements in windows(Using python2.7, pyscripter IDE). But in linux it raises exception (I guess platform doesn't matter).
This is what I am trying:
from commands import getstatusoutput
queryParams = "schönefeld";
cmdString = "http://baseurl" + quote(queryParams)
print getstatusoutput(cmdString)
Exploring the issue reason:
in urllib.quote()
, actually exception being throwin at return ''.join(map(quoter, s))
.
The code in urllib is:
def quote(s, safe='/'):
if not s:
if s is None:
raise TypeError('None object cannot be quoted')
return s
cachekey = (safe, always_safe)
try:
(quoter, safe) = _safe_quoters[cachekey]
except KeyError:
safe_map = _safe_map.copy()
safe_map.update([(c, c) for c in safe])
quoter = safe_map.__getitem__
safe = always_safe + safe
_safe_quoters[cachekey] = (quoter, safe)
if not s.rstrip(safe):
return s
return ''.join(map(quoter, s))
The reason for exception is in ''.join(map(quoter, s))
, for every element in s, quoter function will be called and finally the list will be joined by '' and returned.
For non-ascii char è
, the equivalent key will be %E8
which presents in _safe_map
variable. But when I am calling quote('è'), it searches for the key \xe8
. So that the key does not exist and exception thrown.
So, I just modifed s = [el.upper().replace("\\X","%") for el in s]
before calling ''.join(map(quoter, s))
within try-except block. Now it works fine.
But I am annoying what I have done is correct approach or it will create any other issue? And also I do have 200+ instances of linux which is very tough to deploy this fix in all instances.
You are trying to quote Unicode data, so you need to decide how to turn that into URL-safe bytes.
Encode the string to bytes first. UTF-8 is often used:
>>> import urllib
>>> urllib.quote(u'sch\xe9nefeld')
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
return ''.join(map(quoter, s))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1268, in quote
return ''.join(map(quoter, s))
KeyError: u'\xe9'
>>> urllib.quote(u'sch\xe9nefeld'.encode('utf8'))
'sch%C3%A9nefeld'
However, the encoding depends on what the server will accept. It's best to stick to the encoding the original form was sent with.