Counting word frequency and making a dictionary from it

user3323103 picture user3323103 · Feb 18, 2014 · Viewed 78.6k times · Source

I want to take every word from a text file, and count the word frequency in a dictionary.

Example: 'this is the textfile, and it is used to take words and count'

d = {'this': 1, 'is': 2, 'the': 1, ...} 

I am not that far, but I just can't see how to complete it. My code so far:

import sys

argv = sys.argv[1]
data = open(argv)
words = data.read()
data.close()
wordfreq = {}
for i in words:
    #there should be a counter and somehow it must fill the dict.

Answer

Don picture Don · Feb 18, 2014

If you don't want to use collections.Counter, you can write your own function:

import sys

filename = sys.argv[1]
fp = open(filename)
data = fp.read()
words = data.split()
fp.close()

unwanted_chars = ".,-_ (and so on)"
wordfreq = {}
for raw_word in words:
    word = raw_word.strip(unwanted_chars)
    if word not in wordfreq:
        wordfreq[word] = 0 
    wordfreq[word] += 1

for finer things, look at regular expressions.