How does HashSet compare elements for equality?

nima picture nima · Jan 21, 2012 · Viewed 82.9k times · Source

I have a class that is IComparable:

public class a : IComparable
{
    public int Id { get; set; }
    public string Name { get; set; }

    public a(int id)
    {
        this.Id = id;
    }

    public int CompareTo(object obj)
    {
        return this.Id.CompareTo(((a)obj).Id);
    }
}

When I add a list of object of this class to a hash set:

a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(a1);

Everything is fine and ha.count is 2, but:

a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(new a(1));

Now ha.count is 3.

  1. Why doesn't HashSet respect a's CompareTo method.
  2. Is HashSet the best way to have a list of unique objects?

Answer

Jon Skeet picture Jon Skeet · Jan 21, 2012

It uses an IEqualityComparer<T> (EqualityComparer<T>.Default unless you specify a different one on construction).

When you add an element to the set, it will find the hash code using IEqualityComparer<T>.GetHashCode, and store both the hash code and the element (after checking whether the element is already in the set, of course).

To look an element up, it will first use the IEqualityComparer<T>.GetHashCode to find the hash code, then for all elements with the same hash code, it will use IEqualityComparer<T>.Equals to compare for actual equality.

That means you have two options:

  • Pass a custom IEqualityComparer<T> into the constructor. This is the best option if you can't modify the T itself, or if you want a non-default equality relation (e.g. "all users with a negative user ID are considered equal"). This is almost never implemented on the type itself (i.e. Foo doesn't implement IEqualityComparer<Foo>) but in a separate type which is only used for comparisons.
  • Implement equality in the type itself, by overriding GetHashCode and Equals(object). Ideally, implement IEquatable<T> in the type as well, particularly if it's a value type. These methods will be called by the default equality comparer.

Note how none of this is in terms of an ordered comparison - which makes sense, as there are certainly situations where you can easily specify equality but not a total ordering. This is all the same as Dictionary<TKey, TValue>, basically.

If you want a set which uses ordering instead of just equality comparisons, you should use SortedSet<T> from .NET 4 - which allows you to specify an IComparer<T> instead of an IEqualityComparer<T>. This will use IComparer<T>.Compare - which will delegate to IComparable<T>.CompareTo or IComparable.CompareTo if you're using Comparer<T>.Default.