Where to call .AsParallel() in a LINQ query

user7061022 picture user7061022 · Oct 23, 2016 · Viewed 14.4k times · Source

The question

In a LINQ query I can correctly (as in: the compiler won't complain) call .AsParallel() like this:

(from l in list.AsParallel() where <some_clause> select l).ToList();

or like this:

(from l in list where <some_clause> select l).AsParallel().ToList();

what exactly is the difference?

What I've tried

Judging from the official documentation I've almost always seen the first method used so I thought that was the way to go.
Today tho, I've tried to run some benchmark myself and the result was surprising. Here's the code I've run:

var list = new List<int>();
var rand = new Random();
for (int i = 0; i < 100000; i++)
    list.Add(rand.Next());

var treshold= 1497234;

var sw = new Stopwatch();

sw.Restart();
var result = (from l in list.AsParallel() where l > treshold select l).ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() before: {sw.ElapsedMilliseconds}");

sw.Restart();
result = (from l in list where l > treshold select l).AsParallel().ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() after: {sw.ElapsedMilliseconds}");

Output

call .AsParallel() before: 49
call .AsParallel() after: 4

So, apparently, despite what the documentation says, the second method is much faster. What's exactly happening here?

Answer

Sergey Kalinichenko picture Sergey Kalinichenko · Oct 23, 2016

The trick to using AsParallel in general is to decide if the savings from parallelism outweigh the overhead of doing things in parallel.

When conditions are easy to evaluate, such as yours, the overhead of making multiple parallel streams and collecting their results at the end greatly outweigh the benefit of performing comparisons in parallel.

When conditions are computationally intense, making AsParallel call early speeds things up quite a bit, because the overhead is now small in comparison to the benefit of running multiple Where computations in parallel.

For an example of a computationally hard condition, consider a method that decides whether a number is prime or not. Doing this in parallel on a multi-core CPU will show significant improvement over the non-parallelised implementation.