Java 8 Distinct by property

RichK picture RichK · May 16, 2014 · Viewed 325.4k times · Source

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?

For example I have a list of Person object and I want to remove people with the same name,

persons.stream().distinct();

Will use the default equality check for a Person object, so I need something like,

persons.stream().distinct(p -> p.getName());

Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?

Answer

Stuart Marks picture Stuart Marks · Jan 10, 2015

Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

Then you can write:

persons.stream().filter(distinctByKey(Person::getName))

Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.

(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)