Parallelism in Python

fmark picture fmark · Jun 7, 2010 · Viewed 13.9k times · Source

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:

  1. Message passing processes, possibly distributed across a cluster, e.g. MPI.
  2. Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
  3. Implicit shared memory parallelism, using OpenMP.

Deciding on an approach to use is an exercise in trade-offs.

In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.

In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

Answer

Will picture Will · Jun 7, 2010

Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.

Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.

You can write your performance critical code in C or Cython, and use Python for the glue.