I've recently started using LINQ quite a bit, and I haven't really seen any mention of run-time complexity for any of the LINQ methods. Obviously, there are many factors at play here, so let's restrict the discussion to the plain IEnumerable
LINQ-to-Objects provider. Further, let's assume that any Func
passed in as a selector / mutator / etc. is a cheap O(1) operation.
It seems obvious that all the single-pass operations (Select
, Where
, Count
, Take/Skip
, Any/All
, etc.) will be O(n), since they only need to walk the sequence once; although even this is subject to laziness.
Things are murkier for the more complex operations; the set-like operators (Union
, Distinct
, Except
, etc.) work using GetHashCode
by default (afaik), so it seems reasonable to assume they're using a hash-table internally, making these operations O(n) as well, in general. What about the versions that use an IEqualityComparer
?
OrderBy
would need a sort, so most likely we're looking at O(n log n). What if it's already sorted? How about if I say OrderBy().ThenBy()
and provide the same key to both?
I could see GroupBy
(and Join
) using either sorting, or hashing. Which is it?
Contains
would be O(n) on a List
, but O(1) on a HashSet
- does LINQ check the underlying container to see if it can speed things up?
And the real question - so far, I've been taking it on faith that the operations are performant. However, can I bank on that? STL containers, for example, clearly specify the complexity of every operation. Are there any similar guarantees on LINQ performance in the .NET library specification?
More question (in response to comments):
Hadn't really thought about overhead, but I didn't expect there to be very much for simple Linq-to-Objects. The CodingHorror post is talking about Linq-to-SQL, where I can understand parsing the query and making SQL would add cost - is there a similar cost for the Objects provider too? If so, is it different if you're using the declarative or functional syntax?
There are very, very few guarantees, but there are a few optimizations:
Extension methods that use indexed access, such as ElementAt
, Skip
, Last
or LastOrDefault
, will check to see whether or not the underlying type implements IList<T>
, so that you get O(1) access instead of O(N).
The Count
method checks for an ICollection
implementation, so that this operation is O(1) instead of O(N).
Distinct
, GroupBy
Join
, and I believe also the set-aggregation methods (Union
, Intersect
and Except
) use hashing, so they should be close to O(N) instead of O(N²).
Contains
checks for an ICollection
implementation, so it may be O(1) if the underlying collection is also O(1), such as a HashSet<T>
, but this is depends on the actual data structure and is not guaranteed. Hash sets override the Contains
method, that's why they are O(1).
OrderBy
methods use a stable quicksort, so they're O(N log N) average case.
I think that covers most if not all of the built-in extension methods. There really are very few performance guarantees; Linq itself will try to take advantage of efficient data structures but it isn't a free pass to write potentially inefficient code.