Finding anagrams in a word list

user808066 picture user808066 · Jun 21, 2011 · Viewed 7.2k times · Source

I have a word list and a file containing a number of anagrams. These anagrams are words found in the word list. I need to develop an algorithm to find the matching words and produce them in an output file. The code I have developed so far has only worked for the first two words. In addition, I can't get the code to play nice with strings containing numbers anywhere in it. Please tell me how I can fix the code.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main (void)
{
int x = 0, y = 0;
int a = 0, b = 0;
int emptyx, emptyy;
int match = 0;
ifstream f1, f2;
ofstream f3;
string line, line1[1500], line2[50];
size_t found;

f1.open ("wordlist.txt");
f2.open ("file.txt");
f3.open ("output.txt");

while (f1.eof() == 0)
{
    getline (f1, line);
    line1[x] = line;
    x++;
}

while (f2.eof() == 0)
{
    getline (f2, line);
    line2[y] = line;
    y++;
}

//finds position of last elements
emptyx = x-1;
emptyy = y-1;

//matching algorithm
for (y = 0; y <= emptyy; y++)
{
    for (x = 0; x <= emptyx; x++)
    {
        if (line2[y].length() == line1[x].length())
        {
            for (a = 0; a < line1[x].length(); a++)
            {
                found = line2[y].find(line1[x][a]);
                if (found != string::npos)
                {
                    match++;
                    line2[y].replace(found, 1, 1, '.');

                    if (match == line1[x].length())
                    {
                        f3 << line1[x] << ", ";
                        match = 0;
                    }
                }
            }
        }
    }
}

f1.close();
f2.close();
f3.close();

return 0;
}

Answer

Mitch Wheat picture Mitch Wheat · Jun 21, 2011

Step 1: Build an index with a key of the sorted characters in each word in the wordlist and with the value being the the word.

act   -  cat
act   -  act
dgo   -  dog

...

aeeilnppp - pineapple

....

etc...

Step 2: For each anagram you want to find, sort the characters in your anagram word, and then match against the index to retrieve all words from index with matching sorted key.