Motivation: I have been tasked with measuring the Karp-Flatt metric and parallel efficiency of my CUDA C code, which requires computation of speedup. In particular, I need to plot all these metrics as a function of the number of processors p
.
Definition: Speedup refers to how much a parallel algorithm is faster than a corresponding sequential algorithm, and is defined as:
Issue: I have implemented my algorithm in CUDA C, and have timed it to get Tp
. However, there remains some issues in determining Sp
:
T1
without completely rewriting my code from scratch?
p
when I run different kernels with different numbers of threads?
Many thanks.
To get a reasonable measure of speedup, you need the actual sequential program. If you don't have one, you need to write the best sequential version you can, because comparing a highly tuned parallel code to a junk serial implementation is unreasonable.
Nor can you reasonably compare a 1-processor version of your parallel program to the N-processor version to get a true measure of speedup. Such a comparison tells you speedup from going from P=1 to P=N for the same program, but the point of the speedup curves is to show why building a parallel program (which is usually harder amd requires more complicated hardware [GPU] and tools [OpenCL]) makes sense compared to coding the best sequential version using more widely available hardware and tools.
In other words, no cheating.