I am trying to encode and store, and decode arguments in Python and getting lost somewhere along the way. Here are my steps:
1) I use google toolkit's gtm_stringByEscapingForURLArgument
to convert an NSString properly for passing into HTTP arguments.
2) On my server (python), I store these string arguments as something like u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\\|~<>\u20ac\xa3\xa5\u2022.,?!\''
(note that these are the standard keys on an iphone keypad in the "123" view and the "#+=" view, the \u
and \x
chars in there being some monetary prefixes like pound, yen, etc)
3) I call urllib.quote(myString,'')
on that stored value, presumably to %-escape them for transport to the client so the client can unpercent escape them.
The result is that I am getting an exception when I try to log the result of % escaping. Is there some crucial step I am overlooking that needs to be applied to the stored value with the \u and \x format in order to properly convert it for sending over http?
Update: The suggestion marked as the answer below worked for me. I am providing some updates to address the comments below to be complete, though.
The exception I received cited an issue with \u20ac
. I don't know if it was a problem with that specifically, rather than the fact that it was the first unicode character in the string.
That \u20ac
char is the unicode for the 'euro' symbol. I basically found I'd have issues with it unless I used the urllib2 quote
method.
url encoding a "raw" unicode doesn't really make sense. What you need to do is .encode("utf8")
first so you have a known byte encoding and then .quote()
that.
The output isn't very pretty but it should be a correct uri encoding.
>>> s = u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\|~<>\u20ac\xa3\xa5\u2022.,?!\''
>>> urllib2.quote(s.encode("utf8"))
'1234567890-/%3A%3B%28%29%24%26%40%22.%2C%3F%21%27%5B%5D%7B%7D%23%25%5E%2A%2B%3D_%5C%7C%7E%3C%3E%E2%82%AC%C2%A3%C2%A5%E2%80%A2.%2C%3F%21%27'
Remember that you will need to both unquote()
and decode()
this to print it out properly if you're debugging or whatever.
>>> print urllib2.unquote(urllib2.quote(s.encode("utf8")))
1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!'
>>> # oops, nasty  means we've got a utf8 byte stream being treated as an ascii stream
>>> print urllib2.unquote(urllib2.quote(s.encode("utf8"))).decode("utf8")
1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!'
This is, in fact, what the django functions mentioned in another answer do.
The functions django.utils.http.urlquote() and django.utils.http.urlquote_plus() are versions of Python’s standard urllib.quote() and urllib.quote_plus() that work with non-ASCII characters. (The data is converted to UTF-8 prior to encoding.)
Be careful if you are applying any further quotes or encodings not to mangle things.