Python, using multiprocess is slower than not using it

Rhys picture Rhys · Jan 8, 2012 · Viewed 17k times · Source

After spending a lot of time trying to wrap my head around multiprocessing I came up with this code which is a benchmark test:

Example 1:

from multiprocessing  import Process

class Alter(Process):
    def __init__(self, word):
        Process.__init__(self)
        self.word = word
        self.word2 = ''

    def run(self):
        # Alter string + test processing speed
        for i in range(80000):
            self.word2 = self.word2 + self.word

if __name__=='__main__':
    # Send a string to be altered
    thread1 = Alter('foo')
    thread2 = Alter('bar')
    thread1.start()
    thread2.start()

    # wait for both to finish

    thread1.join()
    thread2.join()

    print(thread1.word2)
    print(thread2.word2)

This completes in 2 seconds (half the time of multithreading). Out of curiosity I decided to run this next:

Example 2:

word2 = 'foo'
word3 = 'bar'

word = 'foo'
for i in range(80000):
    word2 = word2 + word

word  = 'bar'
for i in range(80000):
    word3 = word3 + word

print(word2)
print(word3)

To my horror this ran in less than half a second!

What is going on here? I expected multiprocessing to run faster - shouldn't it complete in half Example 2's time given that Example 1 is Example 2 split into two processes?

Update:

After considering Chris' feedback, I have included the 'actual' code consuming the most process time, and lead me to consider multiprocessing:

self.ListVar = [[13379+ strings],[13379+ strings],
                [13379+ strings],[13379+ strings]]

for b in range(len(self.ListVar)):
    self.list1 = []
    self.temp = []
    for n in range(len(self.ListVar[b])):
        if not self.ListVar[b][n] in self.temp:
            self.list1.insert(n, self.ListVar[b][n] + '(' + 
                              str(self.ListVar[b].count(self.ListVar[b][n])) +
                              ')')
           self.temp.insert(0, self.ListVar[b][n])

   self.ListVar[b] = list(self.list1)

Answer

Chris Eberle picture Chris Eberle · Jan 8, 2012

This example is too small to benefit from multiprocessing.

There's a LOT of overhead when starting a new process. If there were heavy processing involved, it would be negligable. But your example really isn't all that intensive, and so you're bound to notice the overhead.

You'd probably notice a bigger difference with real threads, too bad python (well, CPython) has issues with CPU-bound threading.