concurrent.futures.ThreadPoolExecutor.map is slower than a for loop

inspectorG4dget picture inspectorG4dget · Jan 18, 2014 · Viewed 7.7k times · Source

I am playing with concurrent.futures.ThreadPoolExecutor to see if I can squeeze more work out of my quad-core processor (with 8 logical cores). So I wrote the following code:

from concurrent import futures

def square(n):
    return n**2

def threadWorker(t):
    n, d = t
    if n not in d:
        d[n] = square(n)

def master(n, numthreads):
    d = {}
    with futures.ThreadPoolExecutor(max_workers=numthreads) as e:
        for i in e.map(threadWorker, ((i, d) for i in range(n))):
            pass  # done so that it actually fetches each result. threadWorker has its own side-effects on d
    return len(d)

if __name__ == "__main__":
    print('starting')
    print(master(10**6, 6))
    print('done')

The interesting thing is that the same functionality, when written in a for-loop takes about a second:

>>> d = {}
>>> for i in range(10**6):
...     if i not in d: d[i] = i**2

... while the threadpool code takes well over 10 seconds. Now I know that it's using at least 4 threads because I see the processor load on each of my cores. But even with shared memory (I can understand why processes might take a while, due to memory copying), I feel that this disparity in runtime is far too huge.

Does anyone have any ideas as to why this might take so long? It seems that a simple squaring operation, which is indeed highly parallelizable, should really not take so long. Could it perhaps be due to the population of the dictionary (if so, what is causing the slowdown there?)?

Technical details:

  • Python 3.3.3
  • quad-core (8 logical cores with hypertheading) CPU
  • MAC OSX 10.9.1 (Mavericks)

Answer

bitcycle picture bitcycle · Jan 18, 2014

You're using async threads to try and make CPU-bound work concurrent? I wouldn't recommend it. Use processes instead, otherwise the GIL will slow things down more and more as the size of your thread pool increases.

[Edit 1]

Similar question with references to the GIL explanation from David Beazly (sp?).

Python code performance decreases with threading