Possible Duplicate:
C++ string::find complexity
Recently I noticed that the function std::string::find
is an order of magnitude slower than the function std::strstr
- in my environment with GCC 4.7 on Linux. The performance difference depends on the lengths of the strings and on the hardware architecture.
There seems to be a simple reason for the difference: std::string::find
basically calls std::memcmp
in a loop - with time complexity O(m * n)
. In contrast, std::strstr
is highly optimized for the hardware architecture (e.g. with SSE instructions) and uses a more sophisticated string matching algorithm (apparently Knuth-Morris-Pratt).
I was also surprised not to find the time complexities of these two functions in the language documents (i.e. drafts N3290 and N1570). I only found time complexities for char_traits
. But that doesn't help, because there is no function for substring search in char_traits
.
I would expect, that std::strstr
and memmem
contain similar optimizations with almost identical performance. And until recently, I assumed that std::string::find
uses memmem
internally.
The questions are: Is there any good reason, why std::string::find
does not use std::memmem
? And does it differ in other implementations?
The question is not: What is the best implementation of this function? It is really difficult to argue for C++, if it is slower than C. I wouldn't matter if both implementations would be slow. It is the performance difference that really hurts.
First, what's memmem
? I can't find this in the C++ standard, nor the
Posix standard (which contains all of the standard C functions).
Second, any measurement values will depend on the actual data. Using
KMP, for example, will be a pessimisation in a lot of cases; probably
most of the cases where the member functions of std::string
are used;
the time to set up the necessary tables will often be more than the
total time of the straightforeward algorithm. Things like O(m*n)
don't mean much when the typical length of the string is short.