I have a long list of words, and I want to generate a histogram of the frequency of each word in my list. I was able to do that in the code below:
import csv
from collections import Counter
import numpy as np
word_list = ['A','A','B','B','A','C','C','C','C']
counts = Counter(merged)
labels, values = zip(*counts.items())
indexes = np.arange(len(labels))
plt.bar(indexes, values)
plt.show()
It doesn't, however, display the bins by rank (i.e. by frequency, so highest frequency is first bin on the left and so on), even though when I print counts
it orders them for me Counter({'C': 4, 'A': 3, 'B': 2})
. How could I achieve that?
You can achieve the desired output by sorting your data first and then pass the ordered arrays to bar
; below I use numpy.argsort
for that. The plot then looks as follows (I also added the labels to the bar):
Here is the code that produces the plot with a few inline comments:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
word_list = ['A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']
counts = Counter(word_list)
labels, values = zip(*counts.items())
# sort your values in descending order
indSort = np.argsort(values)[::-1]
# rearrange your data
labels = np.array(labels)[indSort]
values = np.array(values)[indSort]
indexes = np.arange(len(labels))
bar_width = 0.35
plt.bar(indexes, values)
# add labels
plt.xticks(indexes + bar_width, labels)
plt.show()
In case you want to plot only the first n
entries, you can replace the line
counts = Counter(word_list)
by
counts = dict(Counter(word_list).most_common(n))
In the case above, counts
would then be
{'A': 3, 'C': 4}
for n = 2
.
If you like to remove the frame of the plot and label the bars directly, you can check this post.