I have two sparse matrices (a
and b
) in python of the following dimensions:
a = <240760x2177930 sparse matrix of type '<class 'numpy.float64'>'
with 1127853 stored elements in Compressed Sparse Row format>
and
b = <240760x2177930 sparse matrix of type '<class 'numpy.float64'>'
with 439309 stored elements in Compressed Sparse Row format>
Question: I'd like to get a column vector of length 240760 that is the row-wise dot product of the two matrices. For example, dot(a[0],b[0])
would be the first element of my output vector. dot(a[1],b[1])
would be the second, and so forth.
Is there a vectorized easy way to accomplish this?
EDIT: One way to accomplish this would be to convert each row into a dense vector, flatten it out, and use numpy.dot()
. Something like:
np.dot(np.array(a[0]).flatten(),np.array(b[0]).flatten()).
But this requires iterating row wise and convert each row into a dense vector, which is very time consuming. I'm thinking there's probably an easier way to do this...
A scipy
sparse matrix is modeled on the numpy
matrix subclass, and as such implements *
as matrix multiplication. a.multiply
is element by element muliplication, such as that used by np.array
*
.
I'd suggest making a couple of small matrices, and try the various forms of multiplication, including what you think is the np.dot
equivalent. It will be easier to tell what's going on with something small.
a = np.arange(12).reshape(3,4)
a1 = sparse.csr_matrix(a)
np.dot(a, a.T)
a1 * a.T
a*a
a1.multiply(a1)
etc
Just for reference, is this what you want (using dense arrays):
In [7]: a=np.arange(12).reshape(3,4)
In [8]: [np.dot(a[i],a[i]) for i in range(3)]
Out[8]: [14, 126, 366]
In [9]: np.einsum('ij,ij->i',a,a)
Out[9]: array([ 14, 126, 366])
and the sparse
In [11]: a1=sparse.csr_matrix(a)
The full matrix or dot product is more that what you want, right? You just want the diagonal.
In [15]: (a1*a1.T).A
Out[15]:
array([[ 14, 38, 62],
[ 38, 126, 214],
[ 62, 214, 366]], dtype=int32)
In [16]: a.dot(a.T)
Out[16]:
array([[ 14, 38, 62],
[ 38, 126, 214],
[ 62, 214, 366]])
In [21]: (a1*a1.T).diagonal()
Out[21]: array([ 14, 126, 366], dtype=int32)
For something that is quite sparse taking the full matrix multiplication followed by diagonal might be as fast as any alternative. Iterating over the rows of a sparse matrix is a relatively slow operation, while the matrix multiplication has been implemented in fast c code.
Another way - element multiplication followed by sum.
In [22]: np.sum(a*a,axis=1)
Out[22]: array([ 14, 126, 366])
In [23]: a1.multiply(a1).sum(axis=1)
Out[23]:
matrix([[ 14],
[126],
[366]], dtype=int32)
sparse implements sum
as a matrix multiplication (by a column of ones).
In [26]: a1.multiply(a1)*np.array([1,1,1,1])[:,None]
Out[26]:
array([[ 14],
[126],
[366]], dtype=int32)