Building and updating a sparse matrix in python using scipy

syllogismos picture syllogismos · Dec 14, 2013 · Viewed 27.9k times · Source

I'm trying to build and update a sparse matrix as I read data from file. The matrix is of size 100000X40000

What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.

Let's say I have row indices [2, 236, 246, 389, 1691]

and column indices [117, 3, 34, 2757, 74, 1635, 52]

so all the following entries must be incremented by one:

(2,117) (2,3) (2,34) (2,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

and so on.

I'm already using lil_matrix as it gave me a warning to use while I tried to update a single entry.

lil_matrix format is already not supporting multiple updating. matrix[1:3,0] += [2,3] is giving me a notimplemented error.

I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.

My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)

Answer

Jaime picture Jaime · Dec 14, 2013

Creating a second matrix with 1s in your new coordinates and adding it to the existing one is a possible way of doing this:

>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
...     r = np.random.randint(rows, size=100)
...     c = np.random.randint(cols, size=100)
...     d = np.ones((100,))
...     sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
... 
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
    with 9985 stored elements in Compressed Sparse Row format>