I've been reading about Python's multiprocessing module. I still don't think I have a very good understanding of what it can do.
Let's say I have a quadcore processor and I have a list with 1,000,000 integers and I want the sum of all the integers. I could simply do:
list_sum = sum(my_list)
But this only sends it to one core.
Is it possible, using the multiprocessing module, to divide the array up and have each core get the sum of it's part and return the value so the total sum may be computed?
Something like:
core1_sum = sum(my_list[0:500000]) #goes to core 1
core2_sum = sum(my_list[500001:1000000]) #goes to core 2
all_core_sum = core1_sum + core2_sum #core 3 does final computation
Any help would be appreciated.
Yes, it's possible to do this summation over several processes, very much like doing it with multiple threads:
from multiprocessing import Process, Queue
def do_sum(q,l):
q.put(sum(l))
def main():
my_list = range(1000000)
q = Queue()
p1 = Process(target=do_sum, args=(q,my_list[:500000]))
p2 = Process(target=do_sum, args=(q,my_list[500000:]))
p1.start()
p2.start()
r1 = q.get()
r2 = q.get()
print r1+r2
if __name__=='__main__':
main()
However, it is likely that doing it with multiple processes is likely slower than doing it in a single process, as copying the data forth and back is more expensive than summing them right away.