Parallelizing a Numpy vector operation

user1475412 picture user1475412 · Jul 12, 2012 · Viewed 36.5k times · Source

Let's use, for example, numpy.sin()

The following code will return the value of the sine for each value of the array a:

import numpy
a = numpy.arange( 1000000 )
result = numpy.sin( a )

But my machine has 32 cores, so I'd like to make use of them. (The overhead might not be worthwhile for something like numpy.sin() but the function I actually want to use is quite a bit more complicated, and I will be working with a huge amount of data.)

Is this the best (read: smartest or fastest) method:

from multiprocessing import Pool
if __name__ == '__main__':
    pool = Pool()
    result = pool.map( numpy.sin, a )

or is there a better way to do this?

Answer

jorgeca picture jorgeca · Jul 12, 2012

There is a better way: numexpr

Slightly reworded from their main page:

It's a multi-threaded VM written in C that analyzes expressions, rewrites them more efficiently, and compiles them on the fly into code that gets near optimal parallel performance for both memory and cpu bounded operations.

For example, in my 4 core machine, evaluating a sine is just slightly less than 4 times faster than numpy.

In [1]: import numpy as np
In [2]: import numexpr as ne
In [3]: a = np.arange(1000000)
In [4]: timeit ne.evaluate('sin(a)')
100 loops, best of 3: 15.6 ms per loop    
In [5]: timeit np.sin(a)
10 loops, best of 3: 54 ms per loop

Documentation, including supported functions here. You'll have to check or give us more information to see if your more complicated function can be evaluated by numexpr.