In Java 8 how can I filter a collection using the Stream
API by checking the distinctness of a property of each object?
For example I have a list of Person
object and I want to remove people with the same name,
persons.stream().distinct();
Will use the default equality check for a Person
object, so I need something like,
persons.stream().distinct(p -> p.getName());
Unfortunately the distinct()
method has no such overload. Without modifying the equality check inside the Person
class is it possible to do this succinctly?
Consider distinct
to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}
Then you can write:
persons.stream().filter(distinctByKey(Person::getName))
Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct()
does.
(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)