How to create simple fuzzy search with PostgreSQL only?

Alve picture Alve · Oct 11, 2011 · Viewed 38.8k times · Source

I have a little problem with search functionality on my RoR based site. I have many Produts with some CODEs. This code can be any string like "AB-123-lHdfj". Now I use ILIKE operator to find products:

Product.where("code ILIKE ?", "%" + params[:search] + "%")

It works fine, but it can't find product with codes like "AB123-lHdfj", or "AB123lHdfj".

What should I do for this? May be Postgres has some string normalization function, or some other methods to help me?

Answer

Paul Sasik picture Paul Sasik · Oct 11, 2011

Postgres provides a module with several string comparsion functions such as soundex and metaphone. But you will want to use the levenshtein edit distance function.

Example:

test=# SELECT levenshtein('GUMBO', 'GAMBOL');
 levenshtein
-------------
           2
(1 row)

The 2 is the edit distance between the two words. When you apply this against a number of words and sort by the edit distance result you will have the type of fuzzy matches that you're looking for.

Try this query sample: (with your own object names and data of course)

SELECT * 
FROM some_table
WHERE levenshtein(code, 'AB123-lHdfj') <= 3
ORDER BY levenshtein(code, 'AB123-lHdfj')
LIMIT 10

This query says:

Give me the top 10 results of all data from some_table where the edit distance between the code value and the input 'AB123-lHdfj' is less than 3. You will get back all rows where the value of code is within 3 characters difference to 'AB123-lHdfj'...

Note: if you get an error like:

function levenshtein(character varying, unknown) does not exist

Install the fuzzystrmatch extension using:

test=# CREATE EXTENSION fuzzystrmatch;