Why is strcmp not SIMD optimized?

Question 1

Why is strcmp not SIMD optimized?

c++ sse simd strcmp sse2

user1095108 · Oct 27, 2014 · Viewed 8.9k times · Source

Answer

Answer

In a SSE2 implementation, how should the compiler make sure that no memory accesses happen over the end of the string? It has to know the length first and this requires scanning the string for the terminating zero byte.

If you scan for the length of the string you have already accomplished most of the work of a strcmp function. Therefore there is no benefit to use SSE2.

However, Intel added instructions for string handling in the SSE4.2 instruction set. These handle the terminating zero byte problem. For a nice write-up on them read this blog-post:

http://www.strchr.com/strcmp_and_strlen_using_sse_4.2

Question 2

I've tried to compile this program on an x64 computer:

#include <cstring>

int main(int argc, char* argv[])
{
  return ::std::strcmp(argv[0],
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really really really"
    "really really really really really really really long string"
  );
}

I compiled it like this:

g++ -std=c++11 -msse2 -O3 -g a.cpp -o a

But the resulting disassembly is like this:

   0x0000000000400480 <+0>:     mov    (%rsi),%rsi
   0x0000000000400483 <+3>:     mov    $0x400628,%edi
   0x0000000000400488 <+8>:     mov    $0x22d,%ecx
   0x000000000040048d <+13>:    repz cmpsb %es:(%rdi),%ds:(%rsi)
   0x000000000040048f <+15>:    seta   %al
   0x0000000000400492 <+18>:    setb   %dl
   0x0000000000400495 <+21>:    sub    %edx,%eax
   0x0000000000400497 <+23>:    movsbl %al,%eax
   0x000000000040049a <+26>:    retq

Why is no SIMD used? I suppose it could be to compare, say, 16 chars at once. Should I write my own SIMD strcmp, or is it a nonsensical idea for some reason?

Why is strcmp not SIMD optimized?

Answer

Related questions