How to compare almost similar Strings in Java? (String distance measure)

hsmit picture hsmit · Jan 18, 2010 · Viewed 39.4k times · Source

I would like to compare two strings and get some score how much these look alike. For example "The sentence is almost similar" and "The sentence is similar".

I'm not familiar with existing methods in Java, but for PHP I know the levenshtein function.

Are there better methods in Java?

Answer

FiveO picture FiveO · Oct 7, 2011

The following Java libraries offer multiple compare algorithms (Levenshtein,Jaro Winkler,...):

  1. Apache Commons Lang 3: https://commons.apache.org/proper/commons-lang/
  2. Simmetrics: http://sourceforge.net/projects/simmetrics/

Both libraries have a java documentation (Apache Commons Lang Javadoc,Simmetrics Javadoc).

//Usage of Apache Commons Lang 3
import org.apache.commons.lang3.StringUtils;   
public double compareStrings(String stringA, String stringB) {
    return StringUtils.getJaroWinklerDistance(stringA, stringB);
}

 //Usage of Simmetrics
import uk.ac.shef.wit.simmetrics.similaritymetrics.JaroWinkler    
public double compareStrings(String stringA, String stringB) {
    JaroWinkler algorithm = new JaroWinkler();
    return algorithm.getSimilarity(stringA, stringB);
}