Parallel Linq query optimization

David K. picture David K. · Feb 4, 2012 · Viewed 10.5k times · Source

For some time now I've been structuring my code around methods with no side-effects in order to use parallel linq to speed things up. Along the way I've more than once stumbled on lazy evaluation making things worse instead of better and I would like to know if there are any tools to help with optimizing parallel linq queries.

I ask because I recently refactored some embarrassingly parallel code by modifying some methods and peppering AsParallel in certain key places. The run time went down from 2 minutes to 45 seconds but it was clear from the performance monitor that there were some places where all the cores on the CPU were not being fully utilized. After a few false starts I forced some of the queries to execute by using ToArray and the run time went down even further to 16 seconds. It felt good to reduce the run time of the code but it was also slightly disconcerting because it was not clear where in the code queries needed to be forced with ToArray. Waiting until the last minute for the query to execute was not the optimal strategy but it was not clear at all at what points in the code some of the subqueries needed to be forced in order to utilize all the CPU cores.

As it is I have no idea how to properly pepper ToArray or other methods that force linq computations to execute in order to gain maximum CPU utilization. So are there any general guidelines and tools for optimizing parallel linq queries?

Here's a pseudo-code sample:

var firstQuery = someDictionary.SelectMany(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).Where(SomeConditionCheck);
var finalQuery = thirdQuery.Select(FinalTransformation).Where(x => x != null);

FirstTransformation, SecondTransformation, ThirdTransformation are all CPU bound and in terms of complexity they are a few 3x3 matrix multiplications and some if branches. SomeConditionCheck is pretty much a null check. FinalTransformation is the most CPU intensive part of the code because it will perform a whole bunch of line-plane intersections and will check polygon containment for those intersections and then extract the intersection that is closest to a certain point on the line.

I have no idea why the places where I put AsParallel reduced the run time of the code as much as it did. I have now reached a local minimum in terms of run time but I have no idea why. It was just dumb luck that I stumbled on it. In case you're wondering the places to put AsParallel are the first and last lines. Putting AsParallel anywhere else will only increase the run time, sometimes by up to 20 seconds. There is also a hidden ToArray hiding in there on the first line.

Answer

Joe Strommen picture Joe Strommen · Feb 6, 2012

There are a couple things going on here:

  1. PLINQ parallelizes collections more efficiently than uncounted IEnumerables. If you have an array, it divides the array length by your number of CPU cores and tasks them out evenly. But if you have an IEnumerable with an unknown length, it does a goofy exponential ramp-up type of thing, where tasks will process 1, 2, 4, 8, etc. elements at a time until it hits the end of the IEnumerable.
  2. By parallelizing all your queries, you're breaking up work into tiny chunks. If you have M parallel queries across N elements, you end up with M*N tasks. There's more thread overhead in this than if you just parallelize the last query, in which case you'd end up with just N tasks.
  3. PLINQ does best when each task takes roughly the same amount of time to process. That way it can divide them up evenly among the cores. By parallelizing each of your queries that have different performance behavior, you have M*N tasks that take varying amounts of time, and PLINQ is not able to schedule them optimally (because it doesn't know ahead of time how long each one might take).

So the overall guideline here is: make sure that before you start you've got an array if possible, and only put AsParallel on the very last query before evaluation. So something like the following should work pretty well:

var firstQuery = someDictionary.SelectMany().ToArray().Select(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).AsParallel().Where(SomeConditionCheck).ToArray();
var finalQuery = thirdQuery.Select(FinalTransformation).AsParallel().Where(x => x != null);