Several Linq.Enumerable functions take an IEqualityComparer<T>
. Is there a convenient wrapper class that adapts a delegate(T,T)=>bool
to implement IEqualityComparer<T>
? It's easy enough to write one (if your ignore problems with defining a correct hashcode), but I'd like to know if there is an out-of-the-box solution.
Specifically, I want to do set operations on Dictionary
s, using only the Keys to define membership (while retaining the values according to different rules).
GetHashCode
Others have already commented on the fact that any custom IEqualityComparer<T>
implementation should really include a GetHashCode
method; but nobody's bothered to explain why in any detail.
Here's why. Your question specifically mentions the LINQ extension methods; nearly all of these rely on hash codes to work properly, because they utilize hash tables internally for efficiency.
Take Distinct
, for example. Consider the implications of this extension method if all it utilized were an Equals
method. How do you determine whether an item's already been scanned in a sequence if you only have Equals
? You enumerate over the entire collection of values you've already looked at and check for a match. This would result in Distinct
using a worst-case O(N2) algorithm instead of an O(N) one!
Fortunately, this isn't the case. Distinct
doesn't just use Equals
; it uses GetHashCode
as well. In fact, it absolutely does not work properly without an IEqualityComparer<T>
that supplies a proper GetHashCode
. Below is a contrived example illustrating this.
Say I have the following type:
class Value
{
public string Name { get; private set; }
public int Number { get; private set; }
public Value(string name, int number)
{
Name = name;
Number = number;
}
public override string ToString()
{
return string.Format("{0}: {1}", Name, Number);
}
}
Now say I have a List<Value>
and I want to find all of the elements with a distinct name. This is a perfect use case for Distinct
using a custom equality comparer. So let's use the Comparer<T>
class from Aku's answer:
var comparer = new Comparer<Value>((x, y) => x.Name == y.Name);
Now, if we have a bunch of Value
elements with the same Name
property, they should all collapse into one value returned by Distinct
, right? Let's see...
var values = new List<Value>();
var random = new Random();
for (int i = 0; i < 10; ++i)
{
values.Add("x", random.Next());
}
var distinct = values.Distinct(comparer);
foreach (Value x in distinct)
{
Console.WriteLine(x);
}
Output:
x: 1346013431 x: 1388845717 x: 1576754134 x: 1104067189 x: 1144789201 x: 1862076501 x: 1573781440 x: 646797592 x: 655632802 x: 1206819377
Hmm, that didn't work, did it?
What about GroupBy
? Let's try that:
var grouped = values.GroupBy(x => x, comparer);
foreach (IGrouping<Value> g in grouped)
{
Console.WriteLine("[KEY: '{0}']", g);
foreach (Value x in g)
{
Console.WriteLine(x);
}
}
Output:
[KEY = 'x: 1346013431'] x: 1346013431 [KEY = 'x: 1388845717'] x: 1388845717 [KEY = 'x: 1576754134'] x: 1576754134 [KEY = 'x: 1104067189'] x: 1104067189 [KEY = 'x: 1144789201'] x: 1144789201 [KEY = 'x: 1862076501'] x: 1862076501 [KEY = 'x: 1573781440'] x: 1573781440 [KEY = 'x: 646797592'] x: 646797592 [KEY = 'x: 655632802'] x: 655632802 [KEY = 'x: 1206819377'] x: 1206819377
Again: didn't work.
If you think about it, it would make sense for Distinct
to use a HashSet<T>
(or equivalent) internally, and for GroupBy
to use something like a Dictionary<TKey, List<T>>
internally. Could this explain why these methods don't work? Let's try this:
var uniqueValues = new HashSet<Value>(values, comparer);
foreach (Value x in uniqueValues)
{
Console.WriteLine(x);
}
Output:
x: 1346013431 x: 1388845717 x: 1576754134 x: 1104067189 x: 1144789201 x: 1862076501 x: 1573781440 x: 646797592 x: 655632802 x: 1206819377
Yeah... starting to make sense?
Hopefully from these examples it's clear why including an appropriate GetHashCode
in any IEqualityComparer<T>
implementation is so important.
Expanding on orip's answer:
There are a couple of improvements that can be made here.
Func<T, TKey>
instead of Func<T, object>
; this will prevent boxing of value type keys in the actual keyExtractor
itself.where TKey : IEquatable<TKey>
constraint; this will prevent boxing in the Equals
call (object.Equals
takes an object
parameter; you need an IEquatable<TKey>
implementation to take a TKey
parameter without boxing it). Clearly this may pose too severe a restriction, so you could make a base class without the constraint and a derived class with it.Here's what the resulting code might look like:
public class KeyEqualityComparer<T, TKey> : IEqualityComparer<T>
{
protected readonly Func<T, TKey> keyExtractor;
public KeyEqualityComparer(Func<T, TKey> keyExtractor)
{
this.keyExtractor = keyExtractor;
}
public virtual bool Equals(T x, T y)
{
return this.keyExtractor(x).Equals(this.keyExtractor(y));
}
public int GetHashCode(T obj)
{
return this.keyExtractor(obj).GetHashCode();
}
}
public class StrictKeyEqualityComparer<T, TKey> : KeyEqualityComparer<T, TKey>
where TKey : IEquatable<TKey>
{
public StrictKeyEqualityComparer(Func<T, TKey> keyExtractor)
: base(keyExtractor)
{ }
public override bool Equals(T x, T y)
{
// This will use the overload that accepts a TKey parameter
// instead of an object parameter.
return this.keyExtractor(x).Equals(this.keyExtractor(y));
}
}