Dot product sparse matrices

David picture David · Apr 22, 2016 · Viewed 13.4k times · Source

I have two sparse matrices (a and b) in python of the following dimensions:

a = <240760x2177930 sparse matrix of type '<class 'numpy.float64'>'
    with 1127853 stored elements in Compressed Sparse Row format>

and

b = <240760x2177930 sparse matrix of type '<class 'numpy.float64'>'
    with 439309 stored elements in Compressed Sparse Row format>

Question: I'd like to get a column vector of length 240760 that is the row-wise dot product of the two matrices. For example, dot(a[0],b[0]) would be the first element of my output vector. dot(a[1],b[1]) would be the second, and so forth.

Is there a vectorized easy way to accomplish this?

EDIT: One way to accomplish this would be to convert each row into a dense vector, flatten it out, and use numpy.dot(). Something like:

np.dot(np.array(a[0]).flatten(),np.array(b[0]).flatten()).  

But this requires iterating row wise and convert each row into a dense vector, which is very time consuming. I'm thinking there's probably an easier way to do this...

Answer

hpaulj picture hpaulj · Apr 22, 2016

A scipy sparse matrix is modeled on the numpy matrix subclass, and as such implements * as matrix multiplication. a.multiply is element by element muliplication, such as that used by np.array *.

I'd suggest making a couple of small matrices, and try the various forms of multiplication, including what you think is the np.dot equivalent. It will be easier to tell what's going on with something small.

a = np.arange(12).reshape(3,4)
a1 = sparse.csr_matrix(a)

np.dot(a, a.T)
a1 * a.T
a*a
a1.multiply(a1)
etc

Just for reference, is this what you want (using dense arrays):

In [7]: a=np.arange(12).reshape(3,4)

In [8]: [np.dot(a[i],a[i]) for i in range(3)]
Out[8]: [14, 126, 366]

In [9]: np.einsum('ij,ij->i',a,a)
Out[9]: array([ 14, 126, 366])

and the sparse

In [11]: a1=sparse.csr_matrix(a)

The full matrix or dot product is more that what you want, right? You just want the diagonal.

In [15]: (a1*a1.T).A
Out[15]: 
array([[ 14,  38,  62],
       [ 38, 126, 214],
       [ 62, 214, 366]], dtype=int32)

In [16]: a.dot(a.T)
Out[16]: 
array([[ 14,  38,  62],
       [ 38, 126, 214],
       [ 62, 214, 366]])

In [21]: (a1*a1.T).diagonal()
Out[21]: array([ 14, 126, 366], dtype=int32)

For something that is quite sparse taking the full matrix multiplication followed by diagonal might be as fast as any alternative. Iterating over the rows of a sparse matrix is a relatively slow operation, while the matrix multiplication has been implemented in fast c code.

Another way - element multiplication followed by sum.

In [22]: np.sum(a*a,axis=1)
Out[22]: array([ 14, 126, 366])

In [23]: a1.multiply(a1).sum(axis=1)
Out[23]: 
matrix([[ 14],
        [126],
        [366]], dtype=int32)

sparse implements sum as a matrix multiplication (by a column of ones).

In [26]: a1.multiply(a1)*np.array([1,1,1,1])[:,None]
Out[26]: 
array([[ 14],
       [126],
       [366]], dtype=int32)