I would like to make a word frequency distribution, with the words on the x-axis and the frequency count on the y-axis.
I have the following list:
example_list = [('dhr', 17838), ('mw', 13675), ('wel', 5499), ('goed', 5080),
('contact', 4506), ('medicatie', 3797), ('uur', 3792),
('gaan', 3473), ('kwam', 3463), ('kamer', 3447),
('mee', 3278), ('gesprek', 2978)]
I tried to first convert it into a pandas DataFrame and then use the pd.hist()
as in the example below, but I just can't figure it out and think it is actually straight forward but probably I'm missing something.
import numpy as np
import matplotlib.pyplot as plt
word = []
frequency = []
for i in range(len(example_list)):
word.append(example_list[i][0])
frequency.append(example_list[i][1])
plt.bar(word, frequency, color='r')
plt.show()
You can't pass the word
s into the matplotlib.pyplot.bar
directly. However you could create an indices array for bar
and then replace these indices with the words
using matplotlib.pyplot.xticks
:
import numpy as np
import matplotlib.pyplot as plt
indices = np.arange(len(example_list))
plt.bar(indices, frequency, color='r')
plt.xticks(indices, word, rotation='vertical')
plt.tight_layout()
plt.show()
The for
-loop to create word
and frequency
could also be replaced by a simple zip
und list unpacking:
word, frequency = zip(*example_list)