Java: how to find top 10 most common String + frequency in ArrayList?

Iona picture Iona · Mar 14, 2016 · Viewed 9.6k times · Source

I am trying to find the top 10 most common Strings in an ArrayList + their count (frequency of occurrence).

How may I do this with the best time complexity?

The below code finds the most common word + frequency in the form (String=int)

E.g. the=2

  public static Entry<String, Integer> get10MostCommon(WordStream words) {

    ArrayList<String> list = new ArrayList<String>();
    Map<String, Integer> stringsCount = new HashMap<>();
    Map.Entry<String, Integer> mostRepeated = null;

    for (String i : words) {
      list.add(i);
    }

    for (String s : list) {
      Integer c = stringsCount.get(s);
      if (c == null)
        c = new Integer(0);
      c++;
      stringsCount.put(s, c);
    }

    for (Map.Entry<String, Integer> e : stringsCount.entrySet()) {
      if (mostRepeated == null || mostRepeated.getValue() < e.getValue())
        mostRepeated = e;
    }
    return mostRepeated;
  }

Answer

fps picture fps · Mar 14, 2016

You could do it in two steps, using Java 8 streams:

Map<String, Long> map = list.stream()
        .collect(Collectors.groupingBy(w -> w, Collectors.counting()));

List<Map.Entry<String, Long>> result = map.entrySet().stream()
        .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
        .limit(10)
        .collect(Collectors.toList());

The first stream maps words to their frequency by using Collectors.groupingBy() along with Collectors.counting().

This returns a map, whose entries are streamed and sorted by map entry value, in reverse order. Then, the stream is limited to keep only 10 elements, which are finally collected to a list.