I am trying to load the MNIST dataset linked here in Python 3.2 using this program:
import pickle
import gzip
import numpy
with gzip.open('mnist.pkl.gz', 'rb') as f:
l = list(pickle.load(f))
print(l)
Unfortunately, it gives me the error:
Traceback (most recent call last):
File "mnist.py", line 7, in <module>
train_set, valid_set, test_set = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
I then tried to decode the pickled file in Python 2.7, and re-encode it. So, I ran this program in Python 2.7:
import pickle
import gzip
import numpy
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f)
# Printing out the three objects reveals that they are
# all pairs containing numpy arrays.
with gzip.open('mnistx.pkl.gz', 'wb') as g:
pickle.dump(
(train_set, valid_set, test_set),
g,
protocol=2) # I also tried protocol 0.
It ran without error, so I reran this program in Python 3.2:
import pickle
import gzip
import numpy
# note the filename change
with gzip.open('mnistx.pkl.gz', 'rb') as f:
l = list(pickle.load(f))
print(l)
However, it gave me the same error as before. How do I get this to work?
This seems like some sort of incompatibility. It's trying to load a "binstring" object, which is assumed to be ASCII, while in this case it is binary data. If this is a bug in the Python 3 unpickler, or a "misuse" of the pickler by numpy, I don't know.
Here is something of a workaround, but I don't know how meaningful the data is at this point:
import pickle
import gzip
import numpy
with open('mnist.pkl', 'rb') as f:
u = pickle._Unpickler(f)
u.encoding = 'latin1'
p = u.load()
print(p)
Unpickling it in Python 2 and then repickling it is only going to create the same problem again, so you need to save it in another format.