What are the differences between strtok and strsep in C

mizuki picture mizuki · Aug 28, 2011 · Viewed 26.7k times · Source

Could someone explain me what differences there are between strtok() and strsep()? What are the advantages and disadvantages of them? And why would I pick one over the other one.

Answer

Jonathan Leffler picture Jonathan Leffler · Aug 28, 2011

One major difference between strtok() and strsep() is that strtok() is standardized (by the C standard, and hence also by POSIX) but strsep() is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use strtok() than strsep().

Another difference is that calls to the strsep() function on different strings can be interleaved, whereas you cannot do that with strtok() (though you can with strtok_r()). So, using strsep() in a library doesn't break other code accidentally, whereas using strtok() in a library function must be documented because other code using strtok() at the same time cannot call the library function.

The manual page for strsep() at kernel.org says:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.

Thus, the other major difference is the one highlighted by George Gaál in his answer; strtok() permits multiple delimiters between a single token, whereas strsep() expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.

Both strsep() and strtok() modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\0' over the separator after the end of the token).

When to use them?

  • You would use strsep() when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.
  • You would use strtok_r() when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).
  • You would only use strtok() when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own strtok_r() or strsep() than to use strtok().

Why is strtok() poisonous?

The strtok() function is poisonous if used in a library function. If your library function uses strtok(), it must be documented clearly.

That's because:

  1. If any calling function is using strtok() and calls your function that also uses strtok(), you break the calling function.
  2. If your function calls any function that calls strtok(), that will break your function's use of strtok().
  3. If your program is multithreaded, at most one thread can be using strtok() at any given time — across a sequence of strtok() calls.

The root of this problem is the saved state between calls that allows strtok() to continue where it left off. There is no sensible way to fix the problem other than "do not use strtok()".

  • You can use strsep() if it is available.
  • You can use POSIX's strtok_r() if it is available.
  • You can use Microsoft's strtok_s() if it is available.
  • Nominally, you could use the ISO/IEC 9899:2011 Annex K.3.7.3.1 function strtok_s(), but its interface is different from both strtok_r() and Microsoft's strtok_s().

BSD strsep():

char *strsep(char **stringp, const char *delim);

POSIX strtok_r():

char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);

Microsoft strtok_s():

char *strtok_s(char *strToken, const char *strDelimit, char **context);

Annex K strtok_s():

char *strtok_s(char * restrict s1, rsize_t * restrict s1max,
               const char * restrict s2, char ** restrict ptr);

Note that this has 4 arguments, not 3 as in the other two variants on strtok().