Compress numpy arrays efficiently

Basj picture Basj · Mar 14, 2014 · Viewed 31.4k times · Source

I tried various methods to do data compression when saving to disk some numpy arrays.

These 1D arrays contain sampled data at a certain sampling rate (can be sound recorded with a microphone, or any other measurment with any sensor) : the data is essentially continuous (in a mathematical sense ; of course after sampling it is now discrete data).

I tried with HDF5 (h5py) :

f.create_dataset("myarray1", myarray, compression="gzip", compression_opts=9)

but this is quite slow, and the compression ratio is not the best we can expect.

I also tried with

numpy.savez_compressed()

but once again it may not be the best compression algorithm for such data (described before).

What would you choose for better compression ratio on a numpy array, with such data ?

(I thought about things like lossless FLAC (initially designed for audio), but is there an easy way to apply such an algorithm on numpy data ?)

Answer

Albert picture Albert · Nov 11, 2016

What I do now:

import gzip
import numpy

f = gzip.GzipFile("my_array.npy.gz", "w")
numpy.save(file=f, arr=my_array)
f.close()