I am trying to find the top 10 most common Strings in an ArrayList + their count (frequency of occurrence).
How may I do this with the best time complexity?
The below code finds the most common word + frequency in the form (String=int)
E.g. the=2
public static Entry<String, Integer> get10MostCommon(WordStream words) {
ArrayList<String> list = new ArrayList<String>();
Map<String, Integer> stringsCount = new HashMap<>();
Map.Entry<String, Integer> mostRepeated = null;
for (String i : words) {
list.add(i);
}
for (String s : list) {
Integer c = stringsCount.get(s);
if (c == null)
c = new Integer(0);
c++;
stringsCount.put(s, c);
}
for (Map.Entry<String, Integer> e : stringsCount.entrySet()) {
if (mostRepeated == null || mostRepeated.getValue() < e.getValue())
mostRepeated = e;
}
return mostRepeated;
}
You could do it in two steps, using Java 8 streams:
Map<String, Long> map = list.stream()
.collect(Collectors.groupingBy(w -> w, Collectors.counting()));
List<Map.Entry<String, Long>> result = map.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.limit(10)
.collect(Collectors.toList());
The first stream maps words to their frequency by using Collectors.groupingBy()
along with Collectors.counting()
.
This returns a map, whose entries are streamed and sorted by map entry value, in reverse order. Then, the stream is limited to keep only 10 elements, which are finally collected to a list.