IQueryable.Distinct() vs List.Distinct()

Chris B picture Chris B · Jul 1, 2013 · Viewed 7.2k times · Source

I have a linq query that I am using Distinct() on. If I just call Distinct() without converting to a List then it does not return a distinct list - it still contains duplicates.

However if I convert to a List and then call Distinct() - it works as expected and I only get unique objects.

I'm using Telerik ORM and the objects being returned are the class representing one of the tables in the database.

var uniqueUsers = (from u in Database.Users 
                   select u).Distinct();

The code above does not produce distinct results, however when I convert to a list and call distinct - it does:

var uniqueUsers = (from u in Database.Users 
                   select u).ToList().Distinct();

I suspect this has to do with the collection before being converted to a list, comparing references to objects rather than the object data itself but I do not fully understand what is going on - why does the fist code example not produce unique results and what happens to the collection when using .ToList() that makes it work?

[EDIT] I've simplified the above queries, in the real world the query has several joins which generates non-unique results, however I am returning just the User objects.

I tried overriding the Equals and GetHashCode methods but this did not make any difference.

public override bool Equals(object obj)
{
    User comparingObject = obj as User ;

    if (comparingObject == null)
    {
        return false;
    }
    else
    {
        return comparingObject.UserID.Equals(this.UserID);
    }
}

public override int GetHashCode()
{
    return this.UserID.GetHashCode();
}

[UPDATE] Having run the same query in LinqPad, it works as expected providing a list of distinct entries. However running the same query in LinqPad when using the Telerik ORM dll I get multiple entries. So it appears to be a peculiarity with Telerik. When I have time I will investigate further and raise it with Telerik support.

Answer

Mohammad Dehghan picture Mohammad Dehghan · Jul 1, 2013

Obviously you cannot have exact duplicate rows (including the primary key) in your table. Probably what you mean is rows with some equal fields (excluding primary key).

Calling Distinct on IQueryable, generates a SQL DISTINCT operator on the resulting query, which compares every field of your table against each other. Because you cannot have exact duplicate rows in the table, it returns all the rows.

On the other hand, calling Distinct on a List<User> will use Equals method of the User object to compare objects in memory (after fetching all the rows from database). The final result depends on the implementation of Equals method, which could check only some fields for equal values.