vector<int> v;
#pragma omp parallel for ordered schedule(dynamic, anyChunkSizeGreaterThan1)
for (int i = 0; i < n; ++i){
...
...
...
#pragma omp ordered
v.push_back(i);
}
This fills v
with an n
sized ordered list.
When reaching the omp ordered
block all threads need to wait for the lowest iteration possible thread to finish, but what if none of the threads was appointed that specific iteration? Or does the OpenMP runtime library always make sure that the lowest iteration is handled by some thread?
Also why is it suggested that ordered
clause be used along with the dynamic schedule
? Would static schedule
affect performance?
The ordered
clause works like this: different threads execute concurrently until they encounter the ordered
region, which is then executed sequentially in the same order as it would get executed in a serial loop. This still allows for some degree of concurrency, especially if the code section outside the ordered
region has substantial run time.
There is no particular reason to use dynamic
schedule instead of static
schedule with small chunk size. It all depends on the structure of the code. Since ordered
introduces dependency between the threads, if used with schedule(static)
with default chunk size, the second thread would have to wait for the first one to finish all iterations, then the third thread would have to wait for the second one to finish its iterations (and hence for the first one too), and so on. One could easily visualise it with 3 threads and 9 iterations (3 per thread):
tid List of Timeline
iterations
0 0,1,2 ==o==o==o
1 3,4,5 ==.......o==o==o
2 6,7,8 ==..............o==o==o
=
shows that the thread is executing code in parallel. o
is when the thread is executing the ordered
region. .
is the thread being idle, waiting for its turn to execute the ordered
region. With schedule(static,1)
the following would happen:
tid List of Timeline
iterations
0 0,3,6 ==o==o==o
1 1,4,7 ==.o==o==o
2 2,5,8 ==..o==o==o
I believe the difference in both cases is more than obvious. With schedule(dynamic)
the pictures above would become more or less random as the list of iterations assigned to each thread is non-deterministic. It would also add an additional overhead. It is only useful if the amount of computation is different for each iteration and it takes much more time to do the computation than is the added overhead of using dynamic scheduling.
Don't worry about the lowest numbered iteration. It is usually handled to the first thread in the team to become ready to execute code.