I have a Pandas dataframe with one column: Crime type. The column contains 16 different "categories" of crime, which I would like to visualise as a word cloud, with words sized based on their frequency within the dataframe.
I have attempted to do this with the following code:
To bring the data in:
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
To generate the word cloud:
wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
However, I get this error:
TypeError: expected string or bytes-like object
I was able to create an earlier word cloud from the full dataset, using the following code, but I want the word cloud to only generate words from the specific column, 'crime type' ('allCrime.csv' contains approx. 13 columns):
text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
I'm new to Python and Pandas (and coding generally!) so all help is gratefully received.
The problem is that the WordCloud.generate
method that you are using expects a string on which it will count the word instances but you provide a pd.Series
.
Depending on what you want the word cloud to generate on you can either do:
wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))
, which would concatenate all words in your dataframe column and then count all instances.
Use WordCloud.generate_from_frequencies
to manually pass the computed frequencies of words.