I'm using the Levenshtein algorithm to find the similarity between two strings. This is a very important part of the program I'm making, so it needs to be effective. The problem is that the algorithm doesn't find the following examples as similar:
CONAIR
AIRCON
The algorithm will give a distance of 6. So for this word of 6 letters (You look at the word with the highest amount of letters), the difference is of 100% => the similarity is 0%.
I need to find a way to find the similarities between two string, but also taking into consideration cases like the one I presented before.
Is there a better algorithm I can use? Or what do you guys recommend me?
EDIT: I've also looked into the "Damerau–Levenshtein" algorithm, which adds transpositions. The problem is that this transpositions are only for adjacent characters (and not for a number of characters).
I would divide the term into unigrams, bigrams and trigrams, then calculate cosine similarity.