Save / load scipy sparse csr_matrix in portable data format

Henry Thornton picture Henry Thornton · Jan 21, 2012 · Viewed 56k times · Source

How do you save/load a scipy sparse csr_matrix in a portable format? The scipy sparse matrix is created on Python 3 (Windows 64-bit) to run on Python 2 (Linux 64-bit). Initially, I used pickle (with protocol=2 and fix_imports=True) but this didn't work going from Python 3.2.2 (Windows 64-bit) to Python 2.7.2 (Windows 32-bit) and got the error:

TypeError: ('data type not understood', <built-in function _reconstruct>, (<type 'numpy.ndarray'>, (0,), '[98]')).

Next, tried numpy.save and numpy.load as well as scipy.io.mmwrite() and scipy.io.mmread() and none of these methods worked either.

Answer

Henry Thornton picture Henry Thornton · Jan 24, 2012

edit: SciPy 1.19 now has scipy.sparse.save_npz and scipy.sparse.load_npz.

from scipy import sparse

sparse.save_npz("yourmatrix.npz", your_matrix)
your_matrix_back = sparse.load_npz("yourmatrix.npz")

For both functions, the file argument may also be a file-like object (i.e. the result of open) instead of a filename.


Got an answer from the Scipy user group:

A csr_matrix has 3 data attributes that matter: .data, .indices, and .indptr. All are simple ndarrays, so numpy.save will work on them. Save the three arrays with numpy.save or numpy.savez, load them back with numpy.load, and then recreate the sparse matrix object with:

new_csr = csr_matrix((data, indices, indptr), shape=(M, N))

So for example:

def save_sparse_csr(filename, array):
    np.savez(filename, data=array.data, indices=array.indices,
             indptr=array.indptr, shape=array.shape)

def load_sparse_csr(filename):
    loader = np.load(filename)
    return csr_matrix((loader['data'], loader['indices'], loader['indptr']),
                      shape=loader['shape'])