Efficient way to clone a HashSet<T>?

Thomas Levesque picture Thomas Levesque · Oct 13, 2010 · Viewed 12.6k times · Source

A few days ago, I answered an interesting question on SO about HashSet<T>. A possible solution involved cloning the hashset, and in my answer I suggested to do something like this:

HashSet<int> original = ...
HashSet<int> clone = new HashSet<int>(original);

Although this approach is quite straightforward, I suspect it's very inefficient: the constructor of the new HashSet<T> needs to separately add each item from the original hashset, and check if it isn't already present. This is clearly a waste of time: since the source collection is a ISet<T>, it is guaranteed not to contain duplicates. There should be a way to take advantage of that knowledge...

Ideally, HashSet<T> should implement ICloneable, but unfortunately it's not the case. I also checked with Reflector to see if the HashSet<T> constructor did something specific if the source collection was a hashset, but it doesn't. It could probably be done by using reflection on private fields, but that would be an ugly hack...

So, did someone come up with a clever solution to clone a hashset more efficiently ?

(Note that this question is purely theoretical, I don't need to do that in a real program)

Answer

jthg picture jthg · Nov 4, 2010

If you really wanted the most efficient way to clone a HashSet<T>, you'd do the following (but possibly at the cost of maintainability)

  1. Use reflector or the debugger to figure out exactly what fields in HashSet<T> need to be copied. You may need to do this recursively for each field.
  2. Use Reflection.Emit or use expression trees to generate a method which does the necessary copying of all of the fields. May need to call other generated methods which copy the value of each field. We're using runtime code generation because it's the only way to directly access private fields.
  3. Use FormatterServices.GetUninitializedObject(...) to instantiate a blank object. Use the method generated in step 2 to copy the original object to the new blank object.