PHP/MySQL: Highlight "SOUNDS LIKE" query results

Greg picture Greg · Nov 22, 2009 · Viewed 12.7k times · Source

Quick MYSQL/PHP question. I'm using a "not-so-strict" search query as a fallback if no results are found with a normal search query, to the tune of:

foreach($find_array as $word) { 
  clauses[] = "(firstname SOUNDS LIKE '$word%' OR lastname SOUNDS LIKE '$word%')";
}
if (!empty($clauses)) $filter='('.implode(' AND ', $clauses).')';
$query = "SELECT * FROM table WHERE $filter";

Now, I'm using PHP to highlight the results, like:

foreach ($find_array as $term_to_highlight){
    foreach ($result as $key => $result_string){
        $result[$key]=highlight_stuff($result_string, $term_to_highlight);
    }
}

But this method falls on its ass when I don't know what to highlight. Is there any way to find out what the "sound-alike" match is when running that mysql query?

That is to say, if someone searches for "Joan" I want it to highlight "John" instead.

Answer

Vegard Larsen picture Vegard Larsen · Nov 23, 2009

Note that SOUNDS LIKE does not work as you think it does. It is not equivalent to LIKE in MySQL, as it does not support the % wildcard.

This means your query will not find "John David" when searching for "John". This might be acceptable if this is just your fallback, but it is not ideal.

So here is a different suggestion (that might need improvement); first use PHPs soundex() function to find the soundex of the keyword you are looking for.

$soundex = soundex($word);
$soundexPrefix = substr($soundex, 0, 2); // first two characters of soundex
$sql = "SELECT lastname, firstname ".
    "FROM table WHERE SOUNDEX(lastname) LIKE '$soundexPrefix%' ".
    "OR SOUNDEX(firstname) LIKE '$soundexPrefix%'";

Now you'll have a list of firstnames and lastnames that has a vague similarity in sounding (this might be a lot entries, and you might want to increase the length of the soundex prefix you use for your search). You can then calculate the Levenshtein distance between the soundex of each word and your search term, and sort by that.

Second, you should look at parameterized queries in MySQL, to avoid SQL injection bugs.