I have a unicode string in python code:
name = u'Mayte_Martín'
I would like to use it with a SPARQL query, which meant that I should encode the string using 'utf-8' and use urllib.quote_plus or requests.quote on it. However, both these quote functions behave strangely as can be seen when used with and without the 'safe' arguments.
from urllib import quote_plus
Without 'safe' argument:
quote_plus(name.encode('utf-8'))
Output: 'Mayte_Mart%C3%ADn'
With 'safe' argument:
quote_plus(name.encode('utf-8'), safe=':/')
Output:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-164-556248391ee1> in <module>()
----> 1 quote_plus(v, safe=':/')
/usr/lib/python2.7/urllib.pyc in quote_plus(s, safe)
1273 s = quote(s, safe + ' ')
1274 return s.replace(' ', '+')
-> 1275 return quote(s, safe)
1276
1277 def urlencode(query, doseq=0):
/usr/lib/python2.7/urllib.pyc in quote(s, safe)
1264 safe = always_safe + safe
1265 _safe_quoters[cachekey] = (quoter, safe)
-> 1266 if not s.rstrip(safe):
1267 return s
1268 return ''.join(map(quoter, s))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
The problem seems to be with rstrip function. I tried to make some changes and call as...
quote_plus(name.encode('utf-8'), safe=u':/'.encode('utf-8'))
But that did not solve the issue. What could be the issue here?
I'm answering my own question, so that it may help others who face the same issue.
This particular issue arises when you make the following import in the current workspace before executing anything else.
from __future__ import unicode_literals
This has somehow turned out to be incompatible with the following sequence of code.
from urllib import quote_plus
name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/')
The same code without importing unicode_literals works fine.