I have a question that can we normalize the levenshtein edit distance by dividing the e.d value by the length of the two strings? I am asking this because, if we compare two strings of unequal length, the difference between the lengths of the two will be counted as well. for eg: ed('has a', 'has a ball') = 4 and ed('has a', 'has a ball the is round') = 15. if we increase the length of the string, the edit distance will increase even though they are similar. Therefore, I can not set a value, what a good edit distance value should be.
Yes, normalizing the edit distance is one way to put the differences between strings on a single scale from "identical" to "nothing in common".
A few things to consider:
[0, 1]
, you need to divide the distance by the maximum possible distance between two strings of given lengths. That is, length(str1)+length(str2)
for the LCS distance and max(length(str1), length(str2))
for the Levenshtein distance.