Weak vs Strong Scaling Speedup and Efficiency

Chris picture Chris · May 23, 2013 · Viewed 9.2k times · Source

I have a theoretical question. As you know, for the analysis of scaling, the speedup is defined as S(N) = T(1) / T(N) where T(i) is the runtime with i processors. The efficiency is then defined as E(N) = S / N. These definitions make perfect sense for strong scaling.

Right now, I try to compute the weak scaling efficiency of my program. And in this case, the following problem occurs: These formulas are nonsense for weak scaling. Weak scaling means, that the workload on a processors is the same, the number of processors is increased (thus the total problem size as well).

Using the formulas above, a perfectly scaling program would have a speedup of 1 and an efficiency of 1/N - which of coarse is completely unintuitive.

It would seem more appropriate to define the weak scaling efficiency as E(N) = S(1) / S(N). So here is the actual question: How is weak scaling efficiency generally defined? Like I said it would make more sense?

I tried to find it out, but all I got was the well known formulas, probably implicitly only used for strong scaling

Answer

Timmie Smith picture Timmie Smith · May 26, 2013

If you assume the time required for the computation shouldn't increase as the number of processors increases -- which may only be true in embarrassingly parallel problems -- weak scaling efficiency is defined as E(N) = T(1)/T(N).

For example, if every time the number of processors used is doubled the execution time increases by 25% of T(1) then T(16) = T(1) + .25*4*T(1) = 2*T(1) and E(16) = T(1)/2*T(1) = 0.5, or 50%.

A "speedup" in a weak scaling study doesn't make sense. You refer instead to the percentage of time increase as the number of processors increase.

Finally, a minor nit, speedup is defined as the ratio of the execution time of the best known sequential algorithm over the execution time of the parallel implementation. What you're working with is scalability. This is an important thing to note because often times parallel algorithms often use implementations that are sequentially suboptimal.