What guarantees are there on the run-time complexity (Big-O) of LINQ methods?

c# .net linq algorithm complexity-theory

tzaman · May 10, 2010 · Viewed 27.1k times · Source

I've recently started using LINQ quite a bit, and I haven't really seen any mention of run-time complexity for any of the LINQ methods. Obviously, there are many factors at play here, so let's restrict the discussion to the plain IEnumerable LINQ-to-Objects provider. Further, let's assume that any Func passed in as a selector / mutator / etc. is a cheap O(1) operation.

It seems obvious that all the single-pass operations (Select, Where, Count, Take/Skip, Any/All, etc.) will be O(n), since they only need to walk the sequence once; although even this is subject to laziness.

Things are murkier for the more complex operations; the set-like operators (Union, Distinct, Except, etc.) work using GetHashCode by default (afaik), so it seems reasonable to assume they're using a hash-table internally, making these operations O(n) as well, in general. What about the versions that use an IEqualityComparer?

OrderBy would need a sort, so most likely we're looking at O(n log n). What if it's already sorted? How about if I say OrderBy().ThenBy() and provide the same key to both?

I could see GroupBy (and Join) using either sorting, or hashing. Which is it?

Contains would be O(n) on a List, but O(1) on a HashSet - does LINQ check the underlying container to see if it can speed things up?

And the real question - so far, I've been taking it on faith that the operations are performant. However, can I bank on that? STL containers, for example, clearly specify the complexity of every operation. Are there any similar guarantees on LINQ performance in the .NET library specification?

More question (in response to comments):
Hadn't really thought about overhead, but I didn't expect there to be very much for simple Linq-to-Objects. The CodingHorror post is talking about Linq-to-SQL, where I can understand parsing the query and making SQL would add cost - is there a similar cost for the Objects provider too? If so, is it different if you're using the declarative or functional syntax?

Answer

There are very, very few guarantees, but there are a few optimizations:

Extension methods that use indexed access, such as ElementAt, Skip, Last or LastOrDefault, will check to see whether or not the underlying type implements IList<T>, so that you get O(1) access instead of O(N).
The Count method checks for an ICollection implementation, so that this operation is O(1) instead of O(N).
Distinct, GroupBy Join, and I believe also the set-aggregation methods (Union, Intersect and Except) use hashing, so they should be close to O(N) instead of O(N²).
Contains checks for an ICollection implementation, so it may be O(1) if the underlying collection is also O(1), such as a HashSet<T>, but this is depends on the actual data structure and is not guaranteed. Hash sets override the Contains method, that's why they are O(1).
OrderBy methods use a stable quicksort, so they're O(N log N) average case.

I think that covers most if not all of the built-in extension methods. There really are very few performance guarantees; Linq itself will try to take advantage of efficient data structures but it isn't a free pass to write potentially inefficient code.

What guarantees are there on the run-time complexity (Big-O) of LINQ methods?

Answer

Related questions