I'd like to write a function that normalizes the rows of a large sparse matrix (such that they sum to one).
from pylab import *
import scipy.sparse as sp
def normalize(W):
z = W.sum(0)
z[z < 1e-6] = 1e-6
return W / z[None,:]
w = (rand(10,10)<0.1)*rand(10,10)
w = sp.csr_matrix(w)
w = normalize(w)
However this gives the following exception:
File "/usr/lib/python2.6/dist-packages/scipy/sparse/base.py", line 325, in __div__
return self.__truediv__(other)
File "/usr/lib/python2.6/dist-packages/scipy/sparse/compressed.py", line 230, in __truediv__
raise NotImplementedError
Are there any reasonably simple solutions? I have looked at this, but am still unclear on how to actually do the division.
This has been implemented in scikit-learn sklearn.preprocessing.normalize.
from sklearn.preprocessing import normalize
w_normalized = normalize(w, norm='l1', axis=1)
axis=1
should normalize by rows, axis=0
to normalize by column. Use the optional argument copy=False
to modify the matrix in place.