Looking at the source of Java 6, HashSet<E>
is actually implemented using HashMap<E,Object>
, using dummy object instance on every entry of the Set.
I think that wastes 4 byte (on 32-bit machines) for the size of the entry itself.
But, why is it still used? Is there any reason to use it besides making it easier to maintain the codes?
Actually, it's not just HashSet
. All implementations of the Set
interface in Java 6 are based on an underlying Map
. This is not a requirement; it's just the way the implementation is. You can see for yourself by checking out the documentation for the various implementations of Set
.
Your main questions are
But, why is it still used? Is there any reason to use it besides making it easier to maintain the codes?
I assume that code maintenance is a big motivating factor. So is preventing duplication and bloat.
Set
and Map
are similar interfaces, in that duplicate elements are not allowed. (I think the only Set
not backed by a Map
is CopyOnWriteArraySet
, which is an unusual Collection, because it's immutable.)
Specifically:
From the documentation of Set
:
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
The Set interface places additional stipulations, beyond those inherited from the Collection interface, on the contracts of all constructors and on the contracts of the add, equals and hashCode methods. Declarations for other inherited methods are also included here for convenience. (The specifications accompanying these declarations have been tailored to the Set interface, but they do not contain any additional stipulations.)
The additional stipulation on constructors is, not surprisingly, that all constructors must create a set that contains no duplicate elements (as defined above).
And from Map
:
An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
If you can implement your Set
s using existing code, any benefit (speed, for example) you can realize from existing code accrues to your Set
as well.
If you choose to implement a Set
without a Map
backing, you have to duplicate code designed to prevent duplicate elements. Ah, the delicious irony.
That said, there's nothing preventing you from implementing your Set
s differently.