I tried various methods to do data compression when saving to disk some numpy arrays
.
These 1D arrays contain sampled data at a certain sampling rate (can be sound recorded with a microphone, or any other measurment with any sensor) : the data is essentially continuous (in a mathematical sense ; of course after sampling it is now discrete data).
I tried with HDF5
(h5py) :
f.create_dataset("myarray1", myarray, compression="gzip", compression_opts=9)
but this is quite slow, and the compression ratio is not the best we can expect.
I also tried with
numpy.savez_compressed()
but once again it may not be the best compression algorithm for such data (described before).
What would you choose for better compression ratio on a numpy array
, with such data ?
(I thought about things like lossless FLAC (initially designed for audio), but is there an easy way to apply such an algorithm on numpy data ?)
What I do now:
import gzip
import numpy
f = gzip.GzipFile("my_array.npy.gz", "w")
numpy.save(file=f, arr=my_array)
f.close()