How do you save/load a scipy sparse csr_matrix
in a portable format? The scipy sparse matrix is created on Python 3 (Windows 64-bit) to run on Python 2 (Linux 64-bit). Initially, I used pickle (with protocol=2 and fix_imports=True) but this didn't work going from Python 3.2.2 (Windows 64-bit) to Python 2.7.2 (Windows 32-bit) and got the error:
TypeError: ('data type not understood', <built-in function _reconstruct>, (<type 'numpy.ndarray'>, (0,), '[98]')).
Next, tried numpy.save
and numpy.load
as well as scipy.io.mmwrite()
and scipy.io.mmread()
and none of these methods worked either.
edit: SciPy 1.19 now has scipy.sparse.save_npz
and scipy.sparse.load_npz
.
from scipy import sparse
sparse.save_npz("yourmatrix.npz", your_matrix)
your_matrix_back = sparse.load_npz("yourmatrix.npz")
For both functions, the file
argument may also be a file-like object (i.e. the result of open
) instead of a filename.
Got an answer from the Scipy user group:
A csr_matrix has 3 data attributes that matter:
.data
,.indices
, and.indptr
. All are simple ndarrays, sonumpy.save
will work on them. Save the three arrays withnumpy.save
ornumpy.savez
, load them back withnumpy.load
, and then recreate the sparse matrix object with:new_csr = csr_matrix((data, indices, indptr), shape=(M, N))
So for example:
def save_sparse_csr(filename, array):
np.savez(filename, data=array.data, indices=array.indices,
indptr=array.indptr, shape=array.shape)
def load_sparse_csr(filename):
loader = np.load(filename)
return csr_matrix((loader['data'], loader['indices'], loader['indptr']),
shape=loader['shape'])