I have a linq query that I am using Distinct()
on. If I just call Distinct()
without converting to a List then it does not return a distinct list - it still contains duplicates.
However if I convert to a List and then call Distinct()
- it works as expected and I only get unique objects.
I'm using Telerik ORM and the objects being returned are the class representing one of the tables in the database.
var uniqueUsers = (from u in Database.Users
select u).Distinct();
The code above does not produce distinct results, however when I convert to a list and call distinct - it does:
var uniqueUsers = (from u in Database.Users
select u).ToList().Distinct();
I suspect this has to do with the collection before being converted to a list, comparing references to objects rather than the object data itself but I do not fully understand what is going on - why does the fist code example not produce unique results and what happens to the collection when using .ToList()
that makes it work?
[EDIT] I've simplified the above queries, in the real world the query has several joins which generates non-unique results, however I am returning just the User objects.
I tried overriding the Equals
and GetHashCode
methods but this did not make any difference.
public override bool Equals(object obj)
{
User comparingObject = obj as User ;
if (comparingObject == null)
{
return false;
}
else
{
return comparingObject.UserID.Equals(this.UserID);
}
}
public override int GetHashCode()
{
return this.UserID.GetHashCode();
}
[UPDATE] Having run the same query in LinqPad, it works as expected providing a list of distinct entries. However running the same query in LinqPad when using the Telerik ORM dll I get multiple entries. So it appears to be a peculiarity with Telerik. When I have time I will investigate further and raise it with Telerik support.
Obviously you cannot have exact duplicate rows (including the primary key) in your table. Probably what you mean is rows with some equal fields (excluding primary key).
Calling Distinct
on IQueryable
, generates a SQL DISTINCT
operator on the resulting query, which compares every field of your table against each other. Because you cannot have exact duplicate rows in the table, it returns all the rows.
On the other hand, calling Distinct
on a List<User>
will use Equals
method of the User
object to compare objects in memory (after fetching all the rows from database). The final result depends on the implementation of Equals
method, which could check only some fields for equal values.