Wordcloud Python with generate_from_frequencies

cmc_carlos picture cmc_carlos · Mar 27, 2017 · Viewed 24.8k times · Source

I'm trying to create a wordcloud from csv file. The csv file, as an example, has the following structure:

a,1
b,2
c,4
j,20

It has more rows, more or less 1800. The first column has string values (names) and the second column has their respective frequency (int). Then, the file is read and the key,value row is stored in a dictionary (d) because later on we will use this to plot the wordcloud:

reader = csv.reader(open('namesDFtoCSV', 'r',newline='\n'))
d = {}
for k,v in reader:
d[k] = v

Once we have the dictionary full of values, I try to plot the wordcloud:

#Generating wordcloud. Relative scaling value is to adjust the importance of a frequency word.
#See documentation: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py
wordcloud = WordCloud(width=900,height=500, max_words=1628,relative_scaling=1,normalize_plurals=False).generate_from_frequencies(d)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

But an error is thrown:

Traceback (most recent call last):
File ".........../script.py", line 19, in <module>
wordcloud = WordCloud(width=900,height=500, max_words=1628,relative_scaling=1,normalize_plurals=False).generate_from_frequencies(d)
File "/usr/local/lib/python3.5/dist-packages/wordcloud/wordcloud.py", line  360, in generate_from_frequencies
for word, freq in frequencies]
File "/usr/local/lib/python3.5/dist-packages/wordcloud/wordcloud.py", line 360, in <listcomp>
for word, freq in frequencies]
TypeError: unsupported operand type(s) for /: 'str' and 'float

Finally, the documentation says:

def generate_from_frequencies(self, frequencies, max_font_size=None):
    """Create a word_cloud from words and frequencies.
    Parameters
    ----------
    frequencies : dict from string to float
        A contains words and associated frequency.
    max_font_size : int
        Use this font-size instead of self.max_font_size
    Returns
    -------
    self

So, I don't understand why is trowing me this error if I met the requirements of the function. I hope someone can help me, thanks.

Note

I work with worldcloud 1.3.1

Answer

RandomTask picture RandomTask · Jun 18, 2017

This is because the values in your dictionary are strings but wordcloud expects integer or floats.

After I run your code then inspect your dictionary d I get the following.

In [12]: d

Out[12]: {'a': '1', 'b': '2', 'c': '4', 'j': '20'}

Note the ' ' around the numbers means these are really strings.

A hacky way to resolve this is to cast v to an int in your FOR loop like:

d[k] = int(v)

I say this is hacky since it'll work on integers but if you have floats in your input then it may cause problems.

Also, Python errors can be difficult to read. Your error above can be interpreted as

script.py", line 19

TypeError: unsupported operand type(s) for /: 'str' and 'float

"There's a type error on or before line 19 of my file. Let me look at my data types to see if there is any mismatch between string and float..."

The code below works for me:

import csv
from wordcloud import WordCloud
import matplotlib.pyplot as plt

reader = csv.reader(open('namesDFtoCSV', 'r',newline='\n'))
d = {}
for k,v in reader:
    d[k] = int(v)

#Generating wordcloud. Relative scaling value is to adjust the importance of a frequency word.
#See documentation: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py
wordcloud = WordCloud(width=900,height=500, max_words=1628,relative_scaling=1,normalize_plurals=False).generate_from_frequencies(d)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()