How to search for Soundex() substrings in MySQL?

user1014412 picture user1014412 · Dec 21, 2012 · Viewed 11.8k times · Source

i got a problem with the Joomla! 3 integrated search engine. This engine's indexer creates so called soundex-values when indexing content like, for example

Testobject, Testobject 1, Testobject 2239923, Textobject ....

which all have the same soundex-value of T23123.

Now my problem is, if i do a search for Test, then there won't be any results since the soundex-value for this term is T230.

The query used by the search engine is:

SELECT DISTINCT t.term_id AS id, t.term AS term
FROM tablename AS t
WHERE t.soundex = SOUNDEX('test')

I checked the soundex_match function in this topic, but unfortunately this cannot resolve my problem, because it does not compare soundex values.

I want to avoid hacking the cms core and would like understand if there is some kind of approximation procedure available to compare soundex-values like for regular queries when using the % symbol which i could then try to implement using a plugin or whatever.

The MSSQL DIFFERENCE function mentioned here would be ideal, if it would be available in MySQL and ready to use a soundex value as second parameter.

I am not very well experienced in MySQL and have no idea how to improve the query to also match soundex-substrings.

Answer

eggyal picture eggyal · Dec 21, 2012

You're probably looking to calculate the Levenshtein distance; but if you simply want to find those records that start with something that sounds similar to the search term, you can strip any trailing 0 (which is merely used for padding) and then search for soundex strings with the resulting prefix:

WHERE t.soundex LIKE CONCAT(TRIM(TRAILING '0' FROM SOUNDEX('test')), '%')