How do I convert between a measure of similarity and a measure of difference (distance)?

135498 picture 135498 · Oct 31, 2010 · Viewed 13.8k times · Source

Is there a general way to convert between a measure of similarity and a measure of distance?

Consider a similarity measure like the number of 2-grams that two strings have in common.

2-grams('beta', 'delta') = 1
2-grams('apple', 'dappled') = 4

What if I need to feed this to an optimization algorithm that expects a measure of difference, like Levenshtein distance?

This is just an example...I'm looking for a general solution, if one exists. Like how to go from Levenshtein distance to a measure of similarity?

I appreciate any guidance you may offer.

Answer

henrythung picture henrythung · Mar 12, 2015

Let d denotes distance, s denotes similarity. To convert distance measure to similarity measure, we need to first normalize d to [0 1], by using d_norm = d/max(d). Then the similarity measure is given by:

s = 1 - d_norm.

where s is in the range [0 1], with 1 denotes highest similarity (the items in comparison are identical), and 0 denotes lowest similarity (largest distance).