I always assumed that if I was using Select(x=> ...)
in the context of LINQ to objects, then the new collection would be immediately created and remain static. I'm not quite sure WHY I assumed this, and its a very bad assumption but I did. I often use .ToList()
elsewhere, but often not in this case.
This code demonstrates that even a simple 'Select' is subject to deferred execution :
var random = new Random();
var animals = new[] { "cat", "dog", "mouse" };
var randomNumberOfAnimals = animals.Select(x => Math.Floor(random.NextDouble() * 100) + " " + x + "s");
foreach (var i in randomNumberOfAnimals)
{
testContextInstance.WriteLine("There are " + i);
}
foreach (var i in randomNumberOfAnimals)
{
testContextInstance.WriteLine("And now, there are " + i);
}
This outputs the following (the random function is called every time the collection is iterated through):
There are 75 cats
There are 28 dogs
There are 62 mouses
And now, there are 78 cats
And now, there are 69 dogs
And now, there are 43 mouses
I have many places where I have an IEnumerable<T>
as a member of a class. Often the results of a LINQ query are assigned to such an IEnumerable<T>
. Normally for me, this does not cause issues, but I have recently found a few places in my code where it poses more than just a performance issue.
In trying to check for places where I had made this mistake I thought I could check to see if a particular IEnumerable<T>
was of type IQueryable
. This I thought would tell me if the collection was 'deferred' or not. It turns out that the enumerator created by the Select operator above is of type System.Linq.Enumerable+WhereSelectArrayIterator``[System.String,System.String]
and not IQueryable
.
I used Reflector to see what this interface inherited from, and it turns out not to inherit from anything that indicates it is 'LINQ' at all - so there is no way to test based upon the collection type.
I'm quite happy now putting .ToArray()
everywhere now, but I'd like to have a mechanism to make sure this problem doesn't happen in future. Visual Studio seems to know how to do it because it gives a message about 'expanding the results view will evaluate the collection.'
The best I have come up with is :
bool deferred = !object.ReferenceEquals(randomNumberOfAnimals.First(),
randomNumberOfAnimals.First());
Edit: This only works if a new object is created with 'Select' and it not a generic solution. I'm not recommended it in any case though! It was a little tongue in the cheek of a solution.
Deferred execution of LINQ has trapped a lot of people, you're not alone.
The approach I've taken to avoiding this problem is as follows:
Parameters to methods - use IEnumerable<T>
unless there's a need for a more specific interface.
Local variables - usually at the point where I create the LINQ, so I'll know whether lazy evaluation is possible.
Class members - never use IEnumerable<T>
, always use List<T>
. And always make them private.
Properties - use IEnumerable<T>
, and convert for storage in the setter.
public IEnumerable<Person> People
{
get { return people; }
set { people = value.ToList(); }
}
private List<People> people;
While there are theoretical cases where this approach wouldn't work, I've not run into one yet, and I've been enthusiasticly using the LINQ extension methods since late Beta.
BTW: I'm curious why you use ToArray();
instead of ToList();
- to me, lists have a much nicer API, and there's (almost) no performance cost.
Update: A couple of commenters have rightly pointed out that arrays have a theoretical performance advantage, so I've amended my statement above to "... there's (almost) no performance cost."
Update 2: I wrote some code to do some micro-benchmarking of the difference in performance between Arrays and Lists. On my laptop, and in my specific benchmark, the difference is around 5ns (that's nanoseconds) per access. I guess there are cases where saving 5ns per loop would be worthwhile ... but I've never come across one. I had to hike my test up to 100 million iterations before the runtime became long enough to accurately measure.