MPI vs openMP for a shared memory

Shibli picture Shibli · Jul 4, 2012 · Viewed 23.3k times · Source

Lets say there is a computer with 4 CPUs each having 2 cores, so totally 8 cores. With my limited understanding I think that all processors share same memory in this case. Now, is it better to directly use openMP or to use MPI to make it general so that the code could work on both distributed and shared settings. Also, if I use MPI for a shared setting would performance decrease compared with openMP?

Answer

Whether you need or want MPI or OpenMP (or both) heavily depends the type of application you are running, and whether your problem is mostly memory-bound or CPU-bound (or both). Furthermore, it depends on the type of hardware you are running on. A few examples:

Example 1

You need parallelization because you are running out of memory, e.g. you have a simulation and the problem size is so large that your data does not fit into the memory of a single node anymore. However, the operations you perform on the data are rather fast, so you do not need more computational power.

In this case you probably want to use MPI and start one MPI process on each node, thereby making maximum use of the available memory while limiting communication to the bare minimum.

Example 2

You usually have small datasets and only want to speed up your application, which is computationally heavy. Also, you do not want to spend much time thinking about parallelization, but more your algorithms in general.

In this case OpenMP is your first choice. You only need to add a few statements here and there (e.g. in front of your for loops that you want to accelerate), and if your program is not too complex, OpenMP will do the rest for you automatically.

Example 3

You want it all. You need more memory, i.e. more computing nodes, but you also want to speed up your calculations as much as possible, i.e. running on more than one core per node.

Now your hardware comes into play. From my personal experience, if you have only a few cores per node (4-8), the performance penalty created by the general overhead of using OpenMP (i.e. starting up the OpenMP threads etc.) is more than the overhead of processor-internal MPI communication (i.e. sending MPI messages between processes that actually share memory and would not need MPI to communicate).
However, if you are working on a machine with more cores per node (16+), it will become necessary to use a hybrid approach, i.e. parallelizing with MPI and OpenMP at the same time. In this case, hybrid parallelization will be necessary to make full use of your computational resources, but it is also the most difficult to code and to maintain.

Summary
If you have a problem that is small enough to be run on just one node, use OpenMP. If you know that you need more than one node (and thus definitely need MPI), but you favor code readability/effort over performance, use only MPI. If using MPI only does not give you the speedup you would like/require, you have to do it all and go hybrid.

To your second question (in case that did not become clear):
If you setup is such that you do not need MPI at all (because your will always run on only one node), use OpenMP as it will be faster. But If you know that you need MPI anyways, I would start with that and only add OpenMP later, when you know that you've exhausted all reasonable optimization options for MPI.