Explain the speed difference between numpy's vectorized function application VS python's for loop

Kao picture Kao · Jul 5, 2013 · Viewed 9.9k times · Source

I was implementing a weighting system called TF-IDF on a set of 42000 images, each consisting 784 pixels. This is basically a 42000 by 784 matrix.

The first method I attempted made use of explicit loops and took more than 2 hours.

def tfidf(color,img_pix,img_total):
    if img_pix==0:
        return 0
    else:
        return color * np.log(img_total/img_pix)

...

result = np.array([])
for img_vec in data_matrix:
    double_vec = zip(img_vec,img_pix_vec)
    result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])
    try:
        result = np.vstack((result,result_row))
    # first row will throw a ValueError since vstack accepts rows of same len
    except ValueError:
        result = result_row

The second method I attempted used numpy matrices and took less than 5 minutes. Note that data_matrix, img_pix_mat are both 42000 by 784 matrices while img_total is a scalar.

result = data_matrix * np.log(np.divide(img_total,img_pix_mat))

I was hoping someone could explain the immense difference in speed.

The authors of the following paper entitled "The NumPy array: a structure for efficient numerical computation" (http://arxiv.org/pdf/1102.1523.pdf), state on the top of page 4 that they observe a 500 times speed increase due to vectorized computation. I'm presuming much of the speed increase I'm seeing is due to this. However, I would like to go a step further and ask why numpy vectorized computations are that much faster than standard python loops?

Also, perhaps you guys might know of other reasons why the first method is slow. Do try/except structures have high overhead? Or perhaps forming a new np.array for each loop is takes a long time?

Thanks.

Answer

usethedeathstar picture usethedeathstar · Jul 5, 2013

Due to the internal workings of numpy, (as far as i know, numpy works with C internally, so everything you push down to numpy is actually much faster because it is in a different language)

edit: taking out the zip, and replacing it with a vstack should go faster too, (zip tends to go slow if the arguments are very large, and than vstack is faster), (but that is also something which puts it into numpy (thus into C), while zip is python)

and yes, if i understand correctly - not sure about that- , you are doing 42k times a try/except block, that should definately be bad for the speed,

test:

T=numpy.ndarray((5,10))
for t in T:
print t.shape

results in (10,)

This means that yes, if your matrices are 42kx784 matrices, you are trying 42k times a try-except block, i am assuming that should put an effect in the computation times, as well as each time doing a zip each time, but not certain if that would be the main cause,

(so every one of your 42k times you run your stuff, takes 0.17sec, i am quite certain that a try/except block doesnt take 0.17 seconds, but maybe the overhead it causes or so, does contribute to it?

try changing the following:

double_vec = zip(img_vec,img_pix_vec)
result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])

to

result_row=np.array([tfidf(img_vec[i],img_pix_vec[i],img_total) for i in xrange(len(img_vec))])

that at least gets rid of the zip statement, But not sure if the zip statement takes down your time by one min, or by nearly two hours (i know zip is slow, compared to numpy vstack, but no clue if that would give you two hours time gain)