How does the omp ordered clause work?

Question 1

How does the omp ordered clause work?

c++ parallel-processing openmp

Mihai Neacsu · Nov 5, 2012 · Viewed 22.9k times · Source

Answer

Answer

The ordered clause works like this: different threads execute concurrently until they encounter the ordered region, which is then executed sequentially in the same order as it would get executed in a serial loop. This still allows for some degree of concurrency, especially if the code section outside the ordered region has substantial run time.

There is no particular reason to use dynamic schedule instead of static schedule with small chunk size. It all depends on the structure of the code. Since ordered introduces dependency between the threads, if used with schedule(static) with default chunk size, the second thread would have to wait for the first one to finish all iterations, then the third thread would have to wait for the second one to finish its iterations (and hence for the first one too), and so on. One could easily visualise it with 3 threads and 9 iterations (3 per thread):

tid  List of     Timeline
     iterations
0    0,1,2       ==o==o==o
1    3,4,5       ==.......o==o==o
2    6,7,8       ==..............o==o==o

= shows that the thread is executing code in parallel. o is when the thread is executing the ordered region. . is the thread being idle, waiting for its turn to execute the ordered region. With schedule(static,1) the following would happen:

tid  List of     Timeline
     iterations
0    0,3,6       ==o==o==o
1    1,4,7       ==.o==o==o
2    2,5,8       ==..o==o==o

I believe the difference in both cases is more than obvious. With schedule(dynamic) the pictures above would become more or less random as the list of iterations assigned to each thread is non-deterministic. It would also add an additional overhead. It is only useful if the amount of computation is different for each iteration and it takes much more time to do the computation than is the added overhead of using dynamic scheduling.

Don't worry about the lowest numbered iteration. It is usually handled to the first thread in the team to become ready to execute code.

Question 2

vector<int> v;

#pragma omp parallel for ordered schedule(dynamic, anyChunkSizeGreaterThan1)
    for (int i = 0; i < n; ++i){
            ...
            ...
            ...
#pragma omp ordered
            v.push_back(i);
    }

This fills v with an n sized ordered list.

When reaching the omp ordered block all threads need to wait for the lowest iteration possible thread to finish, but what if none of the threads was appointed that specific iteration? Or does the OpenMP runtime library always make sure that the lowest iteration is handled by some thread?

Also why is it suggested that ordered clause be used along with the dynamic schedule? Would static schedule affect performance?

How does the omp ordered clause work?

Answer

Related questions