I'm trying to build and update a sparse matrix as I read data from file.
The matrix is of size 100000X40000
What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.
Let's say I have row indices [2, 236, 246, 389, 1691]
and column indices [117, 3, 34, 2757, 74, 1635, 52]
so all the following entries must be incremented by one:
(2,117) (2,3) (2,34) (2,2757) ...
(236,117) (236,3) (236, 34) (236,2757) ...
and so on.
I'm already using lil_matrix
as it gave me a warning to use while I tried to update a single entry.
lil_matrix
format is already not supporting multiple updating.
matrix[1:3,0] += [2,3]
is giving me a notimplemented error.
I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.
My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)
Creating a second matrix with 1
s in your new coordinates and adding it to the existing one is a possible way of doing this:
>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
... r = np.random.randint(rows, size=100)
... c = np.random.randint(cols, size=100)
... d = np.ones((100,))
... sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
...
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
with 9985 stored elements in Compressed Sparse Row format>