There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.
I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.
Now I want to write this data to a file. While writing i'm getting errors similar to -
'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)
and a lot of data is being lost.
I suppose the data off the urllib read is byte data
I've tried
1. text=text.decode('ascii','ignore')
2. s=filter(lambda x: x in string.printable, s)
3. text=u''+text
text=text.decode().encode('utf-8')
but still im ending up with similar errors. Can somebody point out a proper solution. And also would codecs strip work. I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.
You can do it through smart_str
of Django
module. Just try this:
from django.utils.encoding import smart_str, smart_unicode
text = u'\u2019'
print smart_str(text)
You can install Django by starting a command shell with administrator privileges and run this command:
pip install Django