Attempt to read a binary file in python. From the dataset page:
The pixels are stored as unsigned chars (1 byte) and take values from 0 to 255
I have tried the following, which prints (0,)
, rather than a 784,000 digit array.
# -*- coding: utf8 -*-
# Processed MNIST dataset (http://cis.jhu.edu/~sachin/digit/digit.html)
import struct
f = open('data/data0', mode='rb')
data = []
print struct.unpack('<i', f.read(4))
How can I read this binary into either a 784,000 digit array (28 bytes x 28 bytes x 1k samples), or a 28x28x1000 3D array. I have never worked with binaries before, and am quite confused!
f.read()
will get you an immutable array of 784,000 bytes (called a str
in Python 2). If you need it to be mutable, you can use the array
module and its array type capable of storing various primitives, unsigned bytes (represented by the B
code) included:
from array import array
data = array('B')
with open('data/data0', 'rb') as f:
data.fromfile(f, 784000)
This can be sliced as necessary:
EXAMPLE_SIZE = 24 * 24
examples = [data[s:s + EXAMPLE_SIZE] for s in xrange(0, len(a), EXAMPLE_SIZE)]