Anagram Algorithm in PHP

khiemnn picture khiemnn · May 17, 2012 · Viewed 12.7k times · Source

I'm totally a newbie with PHP. Today I just got a problem that I can't know how to solve, even after searching google and digging SOF. It's the Anagram algorithm.

So basically, I understand the problem here : When user input a string, I split it and compare with my library (a given array), then I will have to join it by 2-3-...etc characters to compare again, it's exactly where I'm stuck now, I don't know how to join the elements of the array.

Here is the code that I'm implementing, and also a sample dictionary.

I have a self-made dictionary with these elements in the array $dict. And i have a form for users to input a string, the string inputted will be passed to the code below and declared as $anagram. I have to split the string inputted to compare with my dictionary. But I don't know how to join them like comparing 2 letters, 3 letters...etc...and so on, to the dictionary.

<?php

$dict = array(
'abde',
'des',
'klajsd',
'ksj',
'hat',
'good',
'book',
'puzzle',
'local',
'php',
'e');

$anagram = $_POST['anagram'];
//change to lowercase
$anagram = strtolower($anagram);

//split the string
$test = str_split($anagram);

//compare with $dict for the first split without joining
for ($i=0; $i<strlen($anagram); $i++) {
    if ($test[$i]==$dict[$i]) {
        echo $test[$i]."<br />";
    }
}

//problem: how to join elements of the array in the loops
//like user inputs "hellodes"
//after echo "e", how to join the elements like: h-e,h-l,h-l,h-o,h-d,h-e,h-s
//and then h-e-l,h-e-l,h-e-o...etc...
?>

I hope to get the algorith as simple as possible because I'm totally a newbie. And I'm sorry because my english is not so good. Best regards, Khiem Nguyen.

Answer

andrewsi picture andrewsi · May 18, 2012

(I'm adding this as a separate answer, as it's a different way of dealing with the issue than I mentioned in my first issue)

This is a more complex way of working out which words in the dictionary are part of the word that you're looking for; I'll leave it up to the reader to work out how it works.

It's using factorisation to work out whether a word is an anagram of another. What it will do is assign each letter a unique, prime value; you can calculate the value of the letters in a given word by multiplying all the values together. CAT, for example, is 37 * 5 * 3, or 510. If your target word factors to the same number, you can be sure that the one is an anagram of the other.

I've ordered the prime numbers by how common they are in UK English, to keep the factors generated smaller.

<?php

function factorise($word)
{
    // Take a number, split it into individual letters, and multiply those values together
    // So long as both words use the same value, you can amend the ordering of the factors 
    // as you like

    $factors = array("e" => 2, "t" => 3, "a" => 5, "o" => 7, "i" => 11,
        "n" => 13, "s" => 17, "h" => 19, "r" => 23, "d" => 29,
        "l" => 31, "c" => 37, "u" => 41, "m" => 43, "w" => 47,
        "f" => 53, "g" => 59, "y" => 61, "p" => 67, "b" => 71,
        "v" => 73, "k" => 79, "j" => 83, "x" => 89, "q" => 97,
        "z" => 101);

    $total = 1;

    $letters = str_split($word);

    foreach ($letters as $thisLetter) {
        if (isset($factors[$thisLetter])) {
            // This will skip any non-alphanumeric characters.
            $total *= $factors[$thisLetter];
        }
    }

    return $total;
}

$searchWord = "hasted";

$dict = array("abde", "des", "klajsd", "ksj", "hat", "hats");

$searchWordFactor = factorise($searchWord);

foreach ($dict as $thisWord) {
    // Factorise each word that we're looking for
    // If the word we've just factored is an exact divisor of the target word, then all the 
    // letters in that word are also present in the target word
    // If you want to do an exact anagram, then check that the two totals are equal

    $dictWordFactor = factorise($thisWord);

    if (($searchWordFactor % $dictWordFactor) == 0) {
        print ($thisWord . " is an anagram of " . $searchWord . "<br/>");
    }
}

For what it's worth, I think this is a much more elegant solution - you can speed it up by pre-calculating the values in your dictionary. If you go through and work out the factors for every word in your dictionary, you can do the searching direct in the database:

SELECT word FROM dictionary WHERE wordFactor='$factorOfThisWord'