The compareTo() method in Java compares two strings "lexicographically". Can someone please simply explain how the lexicographic comparison works in java?
I found this post that explains the three cases of <0 , ==0, and >0 ; However, I am still confused...
Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?
Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?
For example, the below code produces an output of -31. Does this mean that the string Dog is -31 places away from the string cat?
public static void main(String[] args) {
Scanner keyboard = new Scanner(System.in);
String str1 = "Dog";
String str2 = "cat";
int result = str1.compareTo(str2);
System.out.println(result);
The value returned does not really matter as the compareTo
contract is to return negative, positive or 0 (as you already know).
However, if really you want to understand why -31
is returned when comparing Dog
with cat
(or any other string) then you could simply look at the method directly in String
class :
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2;
}
k++;
}
return len1 - len2;
}
Keep in mind that value
is the char
array backing the string.
private final char value[];
So how does this method proceed ?
lim
.char
can be represented as int
value (which take their ascii value) and are already ordered. Thus when substracting a negative number will be returned if the second char is "higher" then the first one. A positive will be returned if the second char is "lower" then the first one. 0 will be returned if both are equals. In your example, first letter of both words are not equals so you get to compare D
with c
which are respectively represented as 68
and 99
. Substract 99 to 68 and you get -31.
So to answer this question :
Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?
No, it is actually either the difference between two non-matching char's ascii value or the difference of both length.
Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?
If you want to ignore the case when comparing, you can use String#compareToIgnoreCase
.
Also you can check this chart for ascii values (upper and lower case).