I am having a problem with implementing the algorithm for computing a prefix sum in parallel. Even though this algorithm has 3 steps, I am unable to write the code, as no pseudo-code is given.
I went through various material on the web and also on stack overflow, but I didn't get the exact implementation of the algorithm as given on the wiki. The wiki mentions the following:
A prefix sum can be calculated in parallel by the following steps::
- Compute the sums of consecutive pairs of items in which the first item of the pair has an even index: z0 = x0 + x1, z1 = x2 + x3, etc.
- Recursively compute the prefix sum w0, w1, w2, ... of the sequence z0, z1, z2, ...
- Expand each term of the sequence w0, w1, w2, ... into two terms of the overall prefix sum: y0 = x0, y1 = w0, y2 = w0 + x2, y3 = w1, etc. After the first value, each successive number yi is either copied from a position half as far through the w sequence, or is the previous value added to one value in the x sequence
Can someone please suggest a pseudo-code implementation for me to check and implement?
I believe the provided answer is not enough to understand the algorithm, so I provided an actual answer with more comprehensive pseudocode here: https://stackoverflow.com/a/12874227/697862 based on Parallel Prefix Sum (Scan) with CUDA which is a complete article describing the optimum parallel algorithm according to:
Blelloch, Guy E. 1990. "Prefix Sums and Their Applications." Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University.