numpy.array.tostring
doesn't seem to preserve information about matrix dimensions (see this question), requiring the user to issue a call to numpy.array.reshape
.
Is there a way to serialize a numpy array to JSON format while preserving this information?
Note: The arrays may contain ints, floats or bools. It's reasonable to expect a transposed array.
Note 2: this is being done with the intent of passing the numpy array through a Storm topology using streamparse, in case such information ends up being relevant.
pickle.dumps
or numpy.save
encode all the information needed to reconstruct an arbitrary NumPy array, even in the presence of endianness issues, non-contiguous arrays, or weird tuple dtypes. Endianness issues are probably the most important; you don't want array([1])
to suddenly become array([16777216])
because you loaded your array on a big-endian machine. pickle
is probably the more convenient option, though save
has its own benefits, given in the npy
format rationale.
The pickle
option:
import pickle
a = # some NumPy array
serialized = pickle.dumps(a, protocol=0) # protocol 0 is printable ASCII
deserialized_a = pickle.loads(serialized)
numpy.save
uses a binary format, and it needs to write to a file, but you can get around that with io.BytesIO
:
a = # any NumPy array
memfile = io.BytesIO()
numpy.save(memfile, a)
memfile.seek(0)
serialized = json.dumps(memfile.read().decode('latin-1'))
# latin-1 maps byte n to unicode code point n
And to deserialize:
memfile = io.BytesIO()
memfile.write(json.loads(serialized).encode('latin-1'))
memfile.seek(0)
a = numpy.load(memfile)