How can I measure the similarity between 2 strings?

Zanoni picture Zanoni · Jun 23, 2009 · Viewed 40.2k times · Source

Given two strings text1 and text2:

public SOMEUSABLERETURNTYPE Compare(string text1, string text2)
{
     // DO SOMETHING HERE TO COMPARE
}

Examples:

  1. First String: StackOverflow

    Second String: StaqOverflow

    Return: Similarity is 91%

    The return can be in % or something like that.

  2. First String: The simple text test

    Second String: The complex text test

    Return: The values can be considered equal

Any ideas? What is the best way to do this?

Answer

Jon Skeet picture Jon Skeet · Jun 23, 2009

There are various different ways of doing this. Have a look at the Wikipedia "String similarity measures" page for links to other pages with algorithms.

I don't think any of those algorithms take sounds into consideration, however - so "staq overflow" would be as similar to "stack overflow" as "staw overflow" despite the first being more similar in terms of pronunciation.

I've just found another page which gives rather more options... in particular, the Soundex algorithm (Wikipedia) may be closer to what you're after.