What is string lexicographically? Java

B537B7725DC58715F6E6BFA7AFC20C picture B537B7725DC58715F6E6BFA7AFC20C · Sep 16, 2015 · Viewed 11.3k times · Source

The compareTo() method in Java compares two strings "lexicographically". Can someone please simply explain how the lexicographic comparison works in java?

I found this post that explains the three cases of <0 , ==0, and >0 ; However, I am still confused...

Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?

Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?

For example, the below code produces an output of -31. Does this mean that the string Dog is -31 places away from the string cat?

public static void main(String[] args) {
     Scanner keyboard = new Scanner(System.in);   

     String str1 = "Dog";

     String str2 = "cat";

     int result = str1.compareTo(str2);
     System.out.println(result);

Answer

Jean-Fran&#231;ois Savard picture Jean-François Savard · Sep 16, 2015

The value returned does not really matter as the compareTo contract is to return negative, positive or 0 (as you already know).

However, if really you want to understand why -31 is returned when comparing Dog with cat (or any other string) then you could simply look at the method directly in String class :

public int compareTo(String anotherString) {
    int len1 = value.length;
    int len2 = anotherString.value.length;
    int lim = Math.min(len1, len2);
    char v1[] = value;
    char v2[] = anotherString.value;

    int k = 0;
    while (k < lim) {
        char c1 = v1[k];
        char c2 = v2[k];
        if (c1 != c2) {
            return c1 - c2;
        }
        k++;
    }
    return len1 - len2;
}

Keep in mind that value is the char array backing the string.

private final char value[];

So how does this method proceed ?

  • You retrieve the minimum of both string length in a variable lim.
  • You create a copy of both string char array.
  • You loop over each characters (verifying if they are equals) until reaching the lowest limit.
  • If two characters at same index are not equals, you return the result of substracting the second one to the first. The char can be represented as int value (which take their ascii value) and are already ordered. Thus when substracting a negative number will be returned if the second char is "higher" then the first one. A positive will be returned if the second char is "lower" then the first one. 0 will be returned if both are equals.
  • If all characters were equals while looping for the lowest string length, you return a substraction of both length.

In your example, first letter of both words are not equals so you get to compare D with c which are respectively represented as 68 and 99. Substract 99 to 68 and you get -31.

So to answer this question :

Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?

No, it is actually either the difference between two non-matching char's ascii value or the difference of both length.

Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?

If you want to ignore the case when comparing, you can use String#compareToIgnoreCase.

Also you can check this chart for ascii values (upper and lower case).