Using python's urllib.quote_plus on utf-8 strings with 'safe' arguments

Question 1

Using python's urllib.quote_plus on utf-8 strings with 'safe' arguments

python utf-8 sparql urllib unicode-escapes

gopalkoduri · Mar 14, 2014 · Viewed 13.2k times · Source

Answer

Answer

I'm answering my own question, so that it may help others who face the same issue.

This particular issue arises when you make the following import in the current workspace before executing anything else.

from __future__ import unicode_literals

This has somehow turned out to be incompatible with the following sequence of code.

from urllib import quote_plus

name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/')

The same code without importing unicode_literals works fine.

Question 2

I have a unicode string in python code:

name = u'Mayte_Martín'

I would like to use it with a SPARQL query, which meant that I should encode the string using 'utf-8' and use urllib.quote_plus or requests.quote on it. However, both these quote functions behave strangely as can be seen when used with and without the 'safe' arguments.

from urllib import quote_plus

Without 'safe' argument:

quote_plus(name.encode('utf-8'))
Output: 'Mayte_Mart%C3%ADn'

With 'safe' argument:

quote_plus(name.encode('utf-8'), safe=':/')
Output: 
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-164-556248391ee1> in <module>()
----> 1 quote_plus(v, safe=':/')

/usr/lib/python2.7/urllib.pyc in quote_plus(s, safe)
   1273         s = quote(s, safe + ' ')
   1274         return s.replace(' ', '+')
-> 1275     return quote(s, safe)
   1276 
   1277 def urlencode(query, doseq=0):

/usr/lib/python2.7/urllib.pyc in quote(s, safe)
   1264         safe = always_safe + safe
   1265         _safe_quoters[cachekey] = (quoter, safe)
-> 1266     if not s.rstrip(safe):
   1267         return s
   1268     return ''.join(map(quoter, s))

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

The problem seems to be with rstrip function. I tried to make some changes and call as...

quote_plus(name.encode('utf-8'), safe=u':/'.encode('utf-8'))

But that did not solve the issue. What could be the issue here?

Using python's urllib.quote_plus on utf-8 strings with 'safe' arguments

Answer

Related questions