Optimized strcmp implementation

Cody Smith picture Cody Smith · Nov 15, 2013 · Viewed 19k times · Source

This function was found here. It's an implementation of strcmp:

int strcmp(const char* s1, const char* s2)
{
    while (*s1 && (*s1 == *s2))
        s1++, s2++;
    return *(const unsigned char*)s1 - *(const unsigned char*)s2;
}

I understand all but the last line, in short what is going on in the last line?

Answer

chux - Reinstate Monica picture chux - Reinstate Monica · Nov 15, 2013
return *(const unsigned char*)s1-*(const unsigned char*)s2;

OP: in short what is going on in the last line?

A: The first potential string difference is compared. Both chars are referenced as unsigned char as required by the spec. The 2 are promoted to int and the difference is returned.


Notes:

1 The return value's sign (<0, 0, >0) is the most meaningful part. It is the only part that is specified by the C spec.

2 On some systems char is signed (more common). On others, char is unsigned. Defining the "sign-ness" of the last comparison promotes portability. Note that fgetc() obtains characters as unsigned char.

3 Other than that a string ends with a \0, the character encoding employed (like ASCII - most common), makes no difference at the binary level. If the first chars that differ in 2 strings have values 65 and 97, the first string will be less than the second, even if the character encoding is non-ASCII. OTOH, strcmp("A", "a") will return a negative number when character encoding is ASCII, but may return a positive number in a different character encoding for their underlying value and order are not defined by C.