Python pickle error: UnicodeDecodeError

90abyss picture 90abyss · Oct 5, 2015 · Viewed 63.1k times · Source

I'm trying to do some text classification using Textblob. I'm first training the model and serializing it using pickle as shown below.

import pickle
from textblob.classifiers import NaiveBayesClassifier

with open('sample.csv', 'r') as fp:
     cl = NaiveBayesClassifier(fp, format="csv")

f = open('sample_classifier.pickle', 'wb')
pickle.dump(cl, f)
f.close()

And when I try to run this file:

import pickle
f = open('sample_classifier.pickle', encoding="utf8")
cl = pickle.load(f)    
f.close()

I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Following are the content of my sample.csv:

My SQL is not working correctly at all. This was a wrong choice, SQL

I've issues. Please respond immediately, Support

Where am I going wrong here? Please help.

Answer

donkopotamus picture donkopotamus · Oct 5, 2015

By choosing to open the file in mode wb, you are choosing to write in raw binary. There is no character encoding being applied.

Thus to read this file, you should simply open in mode rb.