How to read binary file data into arrays?

David Ferris picture David Ferris · Jan 6, 2017 · Viewed 19.1k times · Source

Attempt to read a binary file in python. From the dataset page:

The pixels are stored as unsigned chars (1 byte) and take values from 0 to 255

I have tried the following, which prints (0,), rather than a 784,000 digit array.

# -*- coding: utf8 -*-
# Processed MNIST dataset (http://cis.jhu.edu/~sachin/digit/digit.html)
import struct

f = open('data/data0', mode='rb')
data = []

print struct.unpack('<i', f.read(4))

How can I read this binary into either a 784,000 digit array (28 bytes x 28 bytes x 1k samples), or a 28x28x1000 3D array. I have never worked with binaries before, and am quite confused!

Answer

Ry- picture Ry- · Jan 6, 2017

f.read() will get you an immutable array of 784,000 bytes (called a str in Python 2). If you need it to be mutable, you can use the array module and its array type capable of storing various primitives, unsigned bytes (represented by the B code) included:

from array import array

data = array('B')

with open('data/data0', 'rb') as f:
    data.fromfile(f, 784000)

This can be sliced as necessary:

EXAMPLE_SIZE = 24 * 24
examples = [data[s:s + EXAMPLE_SIZE] for s in xrange(0, len(a), EXAMPLE_SIZE)]