String similarity algorithms?

Robin Rodricks picture Robin Rodricks · Aug 26, 2010 · Viewed 53.5k times · Source

I need to compare 2 strings and calculate their similarity, to filter down a list of the most similar strings.

Eg. searching for "dog" would return

  1. dog
  2. doggone
  3. bog
  4. fog
  5. foggy

Eg. searching for "crack" would return

  1. crack
  2. wisecrack
  3. rack
  4. jack
  5. quack

I have come across:

Do you know of any more string similarity algorithms?

Answer

Peter picture Peter · Aug 26, 2010

The Levenshtein distance is the algorithm I would recommend. It calculates the minimum number of operations you must do to change 1 string into another. The fewer changes means the strings are more similar...