Proper way to construct linq queries to achieve fastest performance?

fiberOptics picture fiberOptics · Oct 25, 2012 · Viewed 8.2k times · Source

Similar question have been asked here but none fits to my need.
I made test cases to see which is faster. But I feel my linq code is still slow. How do I construct the linq code for faster performance?

Others said using double .Tolist() can cause slower operation, when I test it, it shows that it is faster than any other tests.

Test:

Preparation
---------------------------------------------------------------
return Properties of UserInfo(userinf, true){
    UserID = userinf.UserID;
    FirstName = userinf.user.FirstName;
    MiddleName = userinf.user.MiddleName;
    LastName = userinf.user.LastName;
    LoginID = userinf.user.LoginID;
    Birthday = userinf.Birthday;
}

skip = 0;
take = 100;

total table records = 304;

Linq to Entity Framework 

Fiddler: v2.4.0.0

https://127.0.0.1/..../RetrieveUserInfo?skip=0&take=100
{
    "client":{
        "SessionID":"5433ab64-7e0d-444f-b886-a901ea9a0601"
    },
    "session":{
        "SessionID":"35b75daa-25ad-45a4-9f99-0e69ec3b66a4"
    }
}

//Test 1
//1) 00:00:15.3068755 -- Attempt1
//2) 00:00:13.8207905 -- Attempt2
//3) 00:00:16.2489294 -- Attempt3

var list = (from usr in dbase.userinfoes
            select usr).OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();

userlist = (from i in list
            select new UserInfo(i, true)).ToList();


///Test 2
//1) 00:00:15.3908803
//2) 00:00:14.8818512
//3) 00:00:19.4761140

var list = (from usr in dbase.userinfoes.AsEnumerable().OrderBy(i => i.UserID).Skip(skip).Take(take).ToList()
            select new UserInfo(usr, true)).ToList();


//Test 3
//1) 00:00:30.1937270
//2) 00:00:24.1003784
//3) 00:00:28.8806519

var list = dbase.userinfoes.OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();
userlist = (from i in list select new UserInfo(i, true)).ToList();


//Test 4
//1) 00:00:57.2652754
//2) 00:00:54.4051118
//3) 00:00:55.3251644

var list = (from usr in dbase.userinfoes
            select usr).ToList();

userlist = (from i in list
            select new UserInfo(i, true)).OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();


//Test 5
//1) 00:01:06.8378229
//2) 00:01:01.2845053
//3) 00:00:55.0721499

var list = from usr in dbase.userinfoes
           select usr;

userlist = (from i in list.AsEnumerable()
            select new UserInfo(i, true)).OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();


// Test 6
// VERY LONG. It tooks all records first and construct UserInfo one by one before doing the skip and take

var list = (from usr in dbase.userinfoes.AsEnumerable()
            select new UserInfo(usr, true)).OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();


//Test 7
// VERY LONG. It tooks all records first and construct UserInfo one by one before doing the skip and take

var list = from usr in dbase.userinfoes.AsEnumerable()
           select new UserInfo(usr);

Proper code for faster search. Thanks to casperOne for pointing out that the ordering, skip and take are all performed on the server is more faster.

Here is the final code:

var list = (from usr in dbase.userinfoes
                .OrderBy(i => i.UserID)
                .Skip(skip)
                .Take(take)
                .AsEnumerable()
            select new UserInfo(usr, true)).ToList();   

1) 00:00:10.9210513
2) 00:00:10.8270973
3) 00:00:10.8250151

Thanks to Richard Neil Ilagan for the final code.

Answer

casperOne picture casperOne · Oct 25, 2012

Here's why each one is performing as it is and why you're seeing what you're seeing:

Test 1:

///Test 2
//1) 00:00:15.3068755 -- Attempt1
//2) 00:00:13.8207905 -- Attempt2
//3) 00:00:16.2489294 -- Attempt3

var list = (from usr in dbase.userinfoes
            select usr).OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();

userlist = (from i in list
            select new UserInfo(i, true)).ToList();

This is absolutely the fastest. It's the fastest because the ordering, skip and take are all performed on the server. Because you probably have indexes, the server is beefy (powerful), etc. it can handle these operations much faster than if you materialized the whole set on the client and then performed the operations there.

UserInfo constructed only on the post-processed list.

Test 2:

///Test 2
//1) 00:00:15.3908803
//2) 00:00:14.8818512
//3) 00:00:19.4761140

var list = (
    from usr in dbase.userinfoes.AsEnumerable().
        OrderBy(i => i.UserID).Skip(skip).Take(take).ToList()
    select new UserInfo(usr, true)
).ToList();

This should have a performance impact the same as test 7; the call to AsEnumerable is forcing all subsequent operations to be performed in memory (and the call to OrderBy is going to require you to materialize all of the instances before they're ordered).

This is a bit if an anomaly. I'd be curious to see what the SQL sent to the server is (assuming you're using SQL server or some SQL-based back end), to make sure that it's selecting all the records back.

UserInfo constructed only on the post-processed list.

Test 3:

//Test 3
//1) 00:00:30.1937270
//2) 00:00:24.1003784
//3) 00:00:28.8806519

var list = dbase.userinfoes.OrderBy(i => i.UserID).
    Skip(skip).Take(take).ToList();
userlist = (from i in list select new UserInfo(i, true)).ToList();

Again, the order by, skip, and take are taking place on the server. You're materializing the list twice (you have two calls to ToList), which is the only explanation for the overhead I can see).

UserInfo constructed only on the post-processed list.

Test 4:

//Test 4
//1) 00:00:57.2652754
//2) 00:00:54.4051118
//3) 00:00:55.3251644

var list = (from usr in dbase.userinfoes
            select usr).ToList();

userlist = (from i in list select new UserInfo(i, true)).
    OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();

You're materializing the entire list in memory now, so there's more overhead now.

UserInfo constructed on the pre-processed list.

Test 5:

//Test 5
//1) 00:01:06.8378229
//2) 00:01:01.2845053
//3) 00:00:55.0721499

var list = from usr in dbase.userinfoes
           select usr;

userlist = (from i in list.AsEnumerable()
            select new UserInfo(i, true)).
    OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();

Same as test two, you're doing all the operations on the client side.

UserInfo constructed on the pre-processed list.

Test 6:

// Test 6
// VERY LONG. It tooks all records first and construct 
// UserInfo one by one before doing the skip and take

var list = (from usr in dbase.userinfoes.AsEnumerable()
            select new UserInfo(usr, true)).
    OrderBy(i => i.UserID).Skip(skip).Take(take).ToList();

UserInfo constructed on the pre-processed list.

Again, doing all the operations on the client side.

// Test 7
// VERY LONG. It tooks all records first and construct 
// UserInfo one by one before doing the skip and take

var list = from usr in dbase.userinfoes.AsEnumerable()
           select new UserInfo(usr);

Again, doing all the operations on the client side.

UserInfo constructed on the pre-processed list.

There is one difference I notice in all of these tests, and that's where you call the constructor for the UserInfo instance. In the places where the performance is good, you put off constructing the instance of UserInfo as late as possible (after you perform order, take, skip operations), whereas when the performance is bad, you're constructing the UserInfo instance up front, before these operations take place (when there is usually more calls to the UserInfo constructor).

That said, it would seem that your performance issues could lie within the constructor for the UserInfo class, and not in LINQ. Generally, when you let the IQueryable<T> provider perform the operations on the underlying data source, it's generally faster than doing those operations in memory on the client.

Without seeing the constructor code though, it's impossible to tell, but your numbers certainly suggest that the problem lies there, and not in LINQ.