Multiprocessing or Multithreading?

Ram Rachum picture Ram Rachum · Apr 8, 2009 · Viewed 8.9k times · Source

I'm making a program for running simulations in Python, with a wxPython interface. In the program, you can create a simulation, and the program renders (=calculates) it for you. Rendering can be very time-consuming sometimes.

When the user starts a simulation, and defines an initial state, I want the program to render the simulation continuously in the background, while the user may be doing different things in the program. Sort of like a YouTube-style bar that fills up: You can play the simulation only up to the point that was rendered.

Should I use multiple processes or multiple threads or what? People told me to use the multiprocessing package, I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information (and I think my program will need to share a lot of information.) Additionally I also heard about Stackless Python: Is it a separate option? I have no idea.

Please advise.

Answer

S.Lott picture S.Lott · Apr 9, 2009

"I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information..."

This is only partially true.

Threads are part of a process -- threads share memory trivially. Which is as much of a problem as a help -- two threads with casual disregard for each other can overwrite memory and create serious problems.

Processes, however, share information through a lot of mechanisms. A Posix pipeline (a | b) means that process a and process b share information -- a writes it and b reads it. This works out really well for a lot things.

The operating system will assign your processes to every available core as quickly as you create them. This works out really well for a lot of things.

Stackless Python is unrelated to this discussion -- it's faster and has different thread scheduling. But I don't think threads are the best route for this.

"I think my program will need to share a lot of information."

You should resolve this first. Then, determine how to structure processes around the flow of information. A "pipeline" is very easy and natural to do; any shell will create the pipeline trivially.

A "server" is another architecture where multiple client processes get and/or put information into a central server. This is a great way to share information. You can use the WSGI reference implementation as a way to build a simple, reliable server.