How to find most common elements of a list?

user434180 picture user434180 · Aug 29, 2010 · Viewed 122.2k times · Source

Given the following list

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', '']

I am trying to count how many times each word appears and display the top 3.

However I am only looking to find the top three that have the first letter capitalized and ignore all words that do not have the first letter capitalized.

I am sure there is a better way than this, but my idea was to do the following:

  1. put the first word in the list into another list called uniquewords
  2. delete the first word and all its duplicated from the original list
  3. add the new first word into unique words
  4. delete the first word and all its duplicated from original list.
  5. etc...
  6. until the original list is empty....
  7. count how many times each word in uniquewords appears in the original list
  8. find top 3 and print

Answer

Mark Byers picture Mark Byers · Aug 29, 2010

In Python 2.7 and above there is a class called Counter which can help you:

from collections import Counter
words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

Result:

[('Jellicle', 6), ('Cats', 5), ('And', 2)]

I am quite new to programming so please try and do it in the most barebones fashion.

You could instead do this using a dictionary with the key being a word and the value being the count for that word. First iterate over the words adding them to the dictionary if they are not present, or else increasing the count for the word if it is present. Then to find the top three you can either use a simple O(n*log(n)) sorting algorithm and take the first three elements from the result, or you can use a O(n) algorithm that scans the list once remembering only the top three elements.

An important observation for beginners is that by using builtin classes that are designed for the purpose you can save yourself a lot of work and/or get better performance. It is good to be familiar with the standard library and the features it offers.