To count the frequency of each word

user266003 picture user266003 · Mar 29, 2012 · Viewed 15.5k times · Source

There's a directory with a few text files. How do I count the frequency of each word in each file? A word means a set of characters that can contain the letters, the digits and the underlining characters.

Answer

aKzenT picture aKzenT · Mar 31, 2012

Here is a solution that should count all the word frequencies in a file:

    private void countWordsInFile(string file, Dictionary<string, int> words)
    {
        var content = File.ReadAllText(file);

        var wordPattern = new Regex(@"\w+");

        foreach (Match match in wordPattern.Matches(content))
        {
            int currentCount=0;
            words.TryGetValue(match.Value, out currentCount);

            currentCount++;
            words[match.Value] = currentCount;
        }
    }

You can call this code like this:

        var words = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);

        countWordsInFile("file1.txt", words);

After this words will contain all words in the file with their frequency (e.g. words["test"] returns the number of times that "test" is in the file content. If you need to accumulate the results from more than one file, simply call the method for all files with the same dictionary. If you need separate results for each file then create a new dictionary each time and use a structure like @DarkGray suggested.