Why is strcmp not SIMD optimized?

user1095108 picture user1095108 · Oct 27, 2014 · Viewed 8.9k times · Source

I've tried to compile this program on an x64 computer:

#include <cstring>

int main(int argc, char* argv[])
{
  return ::std::strcmp(argv[0],
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really long string"
  );
}

I compiled it like this:

g++ -std=c++11 -msse2 -O3 -g a.cpp -o a

But the resulting disassembly is like this:

   0x0000000000400480 <+0>:     mov    (%rsi),%rsi
   0x0000000000400483 <+3>:     mov    $0x400628,%edi
   0x0000000000400488 <+8>:     mov    $0x22d,%ecx
   0x000000000040048d <+13>:    repz cmpsb %es:(%rdi),%ds:(%rsi)
   0x000000000040048f <+15>:    seta   %al
   0x0000000000400492 <+18>:    setb   %dl
   0x0000000000400495 <+21>:    sub    %edx,%eax
   0x0000000000400497 <+23>:    movsbl %al,%eax
   0x000000000040049a <+26>:    retq 

Why is no SIMD used? I suppose it could be to compare, say, 16 chars at once. Should I write my own SIMD strcmp, or is it a nonsensical idea for some reason?

Answer

Nils Pipenbrinck picture Nils Pipenbrinck · Oct 27, 2014

In a SSE2 implementation, how should the compiler make sure that no memory accesses happen over the end of the string? It has to know the length first and this requires scanning the string for the terminating zero byte.

If you scan for the length of the string you have already accomplished most of the work of a strcmp function. Therefore there is no benefit to use SSE2.

However, Intel added instructions for string handling in the SSE4.2 instruction set. These handle the terminating zero byte problem. For a nice write-up on them read this blog-post:

http://www.strchr.com/strcmp_and_strlen_using_sse_4.2