EF: Lazy loading, eager loading, and "enumerating the enumerable"

Cynthia picture Cynthia · Sep 16, 2010 · Viewed 15.3k times · Source

I find I'm confused about lazy loading, etc.

First, are these two statements equivalent:

(1) Lazy loading:
_flaggedDates = context.FlaggedDates.Include("scheduledSchools")
.Include  ("interviews").Include("partialDayAvailableBlocks")
.Include("visit").Include("events");

(2) Eager loading:
_flaggedDates = context.FlaggedDates;

In other words, in (1) the "Includes" cause the navigation collections/properties to be loaded along with the specific collection requested, regardless of the fact that you are using lazy loading ... right?

And in (2), the statement will load all the navigation entities even though you do not specifically request them, because you are using eager loading ... right?

Second: even if you are using eager loading, the data will not actually be downloaded from the database until you "enumerate the enumerable", as in the following code:

var dates = from d in _flaggedDates
            where d.dateID = 2
            select d;
foreach (FlaggedDate date in dates)
{
... etc.
}

The data will not actually be downloaded ("enumerated") until the foreach loop ... right? In other words, the "var dates" line defines the query, but the query is not executed until the foreach loop.

Given that (if my assumptions are correct), what's the real difference between eager loading and lazy loading?? It seems that in either case, the data does not appear until the enumeration. Am I missing something?

(My specific experience is with code-first, POCO development, by the way ... though the questions may apply more generally.)

Answer

StriplingWarrior picture StriplingWarrior · Sep 16, 2010

Your description of (1) is correct, but it is an example of Eager Loading rather than Lazy Loading.

Your description of (2) is incorrect. (2) is technically using no loading at all, but will use Lazy Loading if you try to access any non-scalar values on your FlaggedDates.

In either case, you are correct that no data will be loaded from your data store until you attempt to "do something" with the _flaggedDates. However, what happens is different in each case.

(1): Eager loading: as soon as you begin your for loop, every one of the objects that you have specified will get pulled from the database and built into a gigantic in-memory data structure. This will be a very expensive operation, pulling an enormous amount of data from your database. However, it will all happen in one database round trip, with a single SQL query getting executed.

(2): Lazy loading: When your for loop begins, it will only load the FlaggedDates objects. However, if you access related objects inside your for loop, it will not have those objects loaded into memory yet. The first attempt to retrieve the scheduledSchools for a given FlaggedDate will result in either a new database roundtrip to retrieve the schools, or an Exception being thrown because your context has already been disposed. Since you'd be accessing the scheduledSchools collection inside a for loop, you would have a new database round trip for every FlaggedDate that you initially loaded at the beginning of the for loop.

Reponse to Comments

Disabling Lazy Loading is not the same as Enabling Eager Loading. In this example:

context.ContextOptions.LazyLoadingEnabled = false;
var schools = context.FlaggedDates.First().scheduledSchools;

The schools variable will contain an empty EntityCollection, because I didn't Include them in the original query (FlaggedDates.First()), and I disabled lazy loading so that they couldn't be loaded after the initial query had been executed.

You are correct that the where d.dateID == 2 would mean that only the objects related to that specific FlaggedDate object would be pulled in. However, depending on how many objects are related to that FlaggedDate, you could still end up with a lot of data going over that wire. This is due to the way the EntityFramework builds out its SQL query. SQL Query results are always in a tabular format, meaning you must have the same number of columns for every row. For every scheduledSchool object, there needs to be at least one row in the result set, and since every row has to contain at least some value for every column, you end up with every scalar value on your FlaggedDate object being repeated. So if you have 10 scheduledSchools and 10 interviews associated with your FlaggedDate, you'll end up with 20 rows that each contain every scalar value on FlaggedDate. Half of the rows will have null values for all the ScheduledSchool columns, and the other half will have null values for all of the Interviews columns.

Where this gets really bad, though, is if you go "deep" in the data you're including. For example, if each ScheduledSchool had a students property, which you included as well, then suddenly you would have a row for each Student in each ScheduledSchool, and on each of those rows, every scalar value for the Student's ScheduledSchool would be included (even though only the first row's values end up getting used), along with every scalar value on the original FlaggedDate object. It can add up quickly.

It's difficult to explain in writing, but if you look at the actual data coming back from a query with multiple Includes, you will see that there is a lot of duplicate data. You can use LinqPad to see the SQL Queries generated by your EF code.