'ascii' codec can't encode character at position * ord not in range(128)

minocha picture minocha · Mar 12, 2013 · Viewed 12.7k times · Source

There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.

I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.

Now I want to write this data to a file. While writing i'm getting errors similar to -

'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)

and a lot of data is being lost.

I suppose the data off the urllib read is byte data

I've tried

   1. text=text.decode('ascii','ignore')
   2. s=filter(lambda x: x in string.printable, s)
   3. text=u''+text
      text=text.decode().encode('utf-8')

but still im ending up with similar errors. Can somebody point out a proper solution. And also would codecs strip work. I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.

Answer

Thanasis Petsas picture Thanasis Petsas · Mar 12, 2013

You can do it through smart_str of Django module. Just try this:

from django.utils.encoding import smart_str, smart_unicode

text = u'\u2019'
print smart_str(text)

You can install Django by starting a command shell with administrator privileges and run this command:

pip install Django