How can I serialize a numpy array while preserving matrix dimensions?

blz picture blz · Jun 7, 2015 · Viewed 42k times · Source

numpy.array.tostring doesn't seem to preserve information about matrix dimensions (see this question), requiring the user to issue a call to numpy.array.reshape.

Is there a way to serialize a numpy array to JSON format while preserving this information?

Note: The arrays may contain ints, floats or bools. It's reasonable to expect a transposed array.

Note 2: this is being done with the intent of passing the numpy array through a Storm topology using streamparse, in case such information ends up being relevant.

Answer

pickle.dumps or numpy.save encode all the information needed to reconstruct an arbitrary NumPy array, even in the presence of endianness issues, non-contiguous arrays, or weird tuple dtypes. Endianness issues are probably the most important; you don't want array([1]) to suddenly become array([16777216]) because you loaded your array on a big-endian machine. pickle is probably the more convenient option, though save has its own benefits, given in the npy format rationale.

The pickle option:

import pickle
a = # some NumPy array
serialized = pickle.dumps(a, protocol=0) # protocol 0 is printable ASCII
deserialized_a = pickle.loads(serialized)

numpy.save uses a binary format, and it needs to write to a file, but you can get around that with io.BytesIO:

a = # any NumPy array
memfile = io.BytesIO()
numpy.save(memfile, a)
memfile.seek(0)
serialized = json.dumps(memfile.read().decode('latin-1'))
# latin-1 maps byte n to unicode code point n

And to deserialize:

memfile = io.BytesIO()
memfile.write(json.loads(serialized).encode('latin-1'))
memfile.seek(0)
a = numpy.load(memfile)