How to calculate the speedup of a GPU program?

mchen picture mchen · Jan 15, 2013 · Viewed 12.4k times · Source

Motivation: I have been tasked with measuring the Karp-Flatt metric and parallel efficiency of my CUDA C code, which requires computation of speedup. In particular, I need to plot all these metrics as a function of the number of processors p.

Definition: Speedup refers to how much a parallel algorithm is faster than a corresponding sequential algorithm, and is defined as:

enter image description here

Issue: I have implemented my algorithm in CUDA C, and have timed it to get Tp. However, there remains some issues in determining Sp:

  • How to observe T1 without completely rewriting my code from scratch?
    • Can I execute CUDA code in serial???
  • What is p when I run different kernels with different numbers of threads?
    • Does it refer to no. of threads or no. of processors used throughout runtime?
    • Since both of these quantities will also vary throughout runtime, is it the maximum or the average used?
    • How do I even restrict my code to run on a subset of processors or with fewer threads!?

Many thanks.

Answer

Ira Baxter picture Ira Baxter · Jan 15, 2013

To get a reasonable measure of speedup, you need the actual sequential program. If you don't have one, you need to write the best sequential version you can, because comparing a highly tuned parallel code to a junk serial implementation is unreasonable.

Nor can you reasonably compare a 1-processor version of your parallel program to the N-processor version to get a true measure of speedup. Such a comparison tells you speedup from going from P=1 to P=N for the same program, but the point of the speedup curves is to show why building a parallel program (which is usually harder amd requires more complicated hardware [GPU] and tools [OpenCL]) makes sense compared to coding the best sequential version using more widely available hardware and tools.

In other words, no cheating.