Does OpenMP natively support reduction of a variable that represents an array?
This would work something like the following...
float* a = (float*) calloc(4*sizeof(float));
omp_set_num_threads(13);
#pragma omp parallel reduction(+:a)
for(i=0;i<4;i++){
a[i] += 1; // Thread-local copy of a incremented by something interesting
}
// a now contains [13 13 13 13]
Ideally, there would be something similar for an omp parallel for, and if you have a large enough number of threads for it to make sense, the accumulation would happen via binary tree.
Array reduction is now possible with OpenMP 4.5 for C and C++. Here's an example:
#include <iostream>
int main()
{
int myArray[6] = {};
#pragma omp parallel for reduction(+:myArray[:6])
for (int i=0; i<50; ++i)
{
double a = 2.0; // Or something non-trivial justifying the parallelism...
for (int n = 0; n<6; ++n)
{
myArray[n] += a;
}
}
// Print the array elements to see them summed
for (int n = 0; n<6; ++n)
{
std::cout << myArray[n] << " " << std::endl;
}
}
Outputs:
100
100
100
100
100
100
I compiled this with GCC 6.2. You can see which common compiler versions support the OpenMP 4.5 features here: https://www.openmp.org/resources/openmp-compilers-tools/
Note from the comments above that while this is convenient syntax, it may invoke a lot of overheads from creating copies of each array section for each thread.