What's wrong with strcmp?

David Cary picture David Cary · Jun 22, 2014 · Viewed 32.7k times · Source

In the responses to the question Reading In A String and comparing it C, more than one person discouraged the use of strcmp(), saying things like

I also strongly, strongly advise you to get used to using strncmp() now, ... to avoid many problems down the road.

or (in Why does my string comparison fail? )

Make certain you use strncmp and not strcmp. strcmp is profoundly unsafe.

What problems are they alluding to?

The reason scanf() with string specifiers and gets() are strongly discouraged is because they almost inevitably lead to buffer overflow vulnerabilities. However, it's not possible to overflow a buffer with strcmp(), right?

"A buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory."

( -- Wikipedia: buffer overflow).

Since the strcmp() function never writes to any buffer, the strcmp() function cannot cause a buffer overflow, right?

What is the reason people discourage the use of strcmp(), and recommend strncmp() instead?

Answer

Jonathon Reinhart picture Jonathon Reinhart · Jun 22, 2014

While strncmp can prevent you from overrunning a buffer, its primary purpose isn't for safety. Rather, it exists for the case where one wants to compare only the first N characters of a (properly possibly NUL-terminated) string.

From the man page:

The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.

The strncmp() function is similar, except it compares the only first (at most) n bytes of s1 and s2.

Note that strncmp in this case cannot be replaced with a simple memcmp, because you still need to take advantage of its stop-on-NUL behavior, in case one of the strings is shorter than n.

If strcmp causes a buffer overrun, then one of two things is true:

  1. Your data isn't expected to be NUL-terminated, and you should be using memcmp instead.
  2. Your data is expected to be NUL-terminated, but you've already screwed up when you populated the buffer, by somehow not NUL-terminating it.

Note that reading past the end of a buffer is still considered a buffer overrun. While it may seem harmless, it can be just as dangerous as writing past the end.

Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but other times you run into some other buffer, causing inconstant program behavior.