Matching inexact company names in Java

Sophie Gage picture Sophie Gage · Nov 27, 2008 · Viewed 7.6k times · Source

I have a database of companies. My application receives data that references a company by name, but the name may not exactly match the value in the database. I need to match the incoming data to the company it refers to.

For instance, my database might contain a company with name "A. B. Widgets & Co Ltd." while my incoming data might reference "AB Widgets Limited", "A.B. Widgets and Co", or "A B Widgets".

Some words in the company name (A B Widgets) are more important for matching than others (Co, Ltd, Inc, etc). It's important to avoid false matches.

The number of companies is small enough that I can maintain a map of their names in memory, ie. I have the option of using Java rather than SQL to find the right name.

How would you do this in Java?

Answer

Drew Hall picture Drew Hall · Nov 27, 2008

You could standardize the formats as much as possible in your DB/map & input (i.e. convert to upper/lowercase), then use the Levenshtein (edit) distance metric from dynamic programming to score the input against all your known names.

You could then have the user confirm the match & if they don't like it, give them the option to enter that value into your list of known names (on second thought--that might be too much power to give a user...)