I got the question from here with my changes. I have following code:
from nltk.corpus import stopwords
>>> def content_text(text):
stopwords = nltk.corpus.stopwords.words('english')
content = [w for w in text if w.lower() in stopwords]
return content
How can I print the 10 most frequently occurring words of a text that 1)including and 2)excluding stopwords?
There is a FreqDist function in nltk
import nltk
allWords = nltk.tokenize.word_tokenize(text)
allWordDist = nltk.FreqDist(w.lower() for w in allWords)
stopwords = nltk.corpus.stopwords.words('english')
allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords)
to extract 10 most common:
mostCommon= allWordDist.most_common(10).keys()