what is the quickest way to iterate through a numpy array

piRSquared picture piRSquared · Nov 14, 2016 · Viewed 14.6k times · Source

I noticed a meaningful difference between iterating through a numpy array "directly" versus iterating through via the tolist method. See timing below:

directly
[i for i in np.arange(10000000)]
via tolist
[i for i in np.arange(10000000).tolist()]

enter image description here


considering I've discovered one way to go faster. I wanted to ask what else might make it go faster?

what is fastest way to iterate through a numpy array?

Answer

James picture James · Nov 14, 2016

This is actually not surprising. Let's examine the methods one a time starting with the slowest.

[i for i in np.arange(10000000)]

This method asks python to reach into the numpy array (stored in the C memory scope), one element at a time, allocate a Python object in memory, and create a pointer to that object in the list. Each time you pipe between the numpy array stored in the C backend and pull it into pure python, there is an overhead cost. This method adds in that cost 10,000,000 times.

Next:

[i for i in np.arange(10000000).tolist()]

In this case, using .tolist() makes a single call to the numpy C backend and allocates all of the elements in one shot to a list. You then are using python to iterate over that list.

Finally:

list(np.arange(10000000))

This basically does the same thing as above, but it creates a list of numpy's native type objects (e.g. np.int64). Using list(np.arange(10000000)) and np.arange(10000000).tolist() should be about the same time.


So, in terms of iteration, the primary advantage of using numpy is that you don't need to iterate. Operation are applied in an vectorized fashion over the array. Iteration just slows it down. If you find yourself iterating over array elements, you should look into finding a way to restructure the algorithm you are attempting, in such a way that is uses only numpy operations (it has soooo many built-in!) or if really necessary you can use np.apply_along_axis, np.apply_over_axis, or np.vectorize.