strtok() issue: If tokens are delimited by delimiters,why is last token between a delimiter and the null '\0'?

Rüppell's Vulture picture Rüppell's Vulture · May 15, 2013 · Viewed 33.9k times · Source

In the following program, strtok() works as expected in the major part but I just can't comprehend the reason behind one finding. I have read about strtok() that:

To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token.

Source: http://www.cplusplus.com/reference/cstring/strtok/

And as we know, strtok() places a \0 at the end of each token. But in the following program, the last delimiter is a dot(.), after which there is Toad between that dot and the quotation mark ("). Now the dot is a delimiter in my program, but there is no delimiter after Toad, not even a white space (which is a delimiter in my program). Please clear the following confusion arising from this premise:

Why is strtok() considering Toad as a token even though it is not between 2 delimiters? This is what I read about strtok() when it encounters a NULL character (\0):

Once the terminating null character of str has been found in a call to strtok, all subsequent calls to this function with a null pointer as the first argument return a null pointer.

Source: http://www.cplusplus.com/reference/cstring/strtok/

Nowhere does it say that once a null character is encountered,a pointer to the beginning of the token is returned (we don't even have a token here as we didn't get an end of the token as there was no delimiter character found after the scan begun from the beginning of the token (i.e. from 'T' of Toad), we only found a null character, not a delimiter). So why is the part between last delimiter and quotation mark of argument string considered a token by strtok()? Please explain this.

Code:

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] =" Falcon,eagle-hawk..;buzzard,gull..pigeon sparrow,hen;owl.Toad";
  char * pch=strtok(str," ;,.-");

    while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ;,.-");
  }

  return 0;
}

Output:

Falcon
eagle
hawk
buzzard
gull
pigeon
sparrow
hen
owl
Toad

Answer

Daniel Fischer picture Daniel Fischer · May 15, 2013

The standard's specification of strtok (7.24.5.8) is pretty clear. In particular paragraph 4 (emphasis added by me) is directly relevant to the question, if I understand that correctly:

3 The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

4 The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.

In a call

char *where = strtok(string_or_NULL, delimiters);

the token (a pointer to which is) returned - if any - extends from the first non-delimiter character found from the starting position (inclusive) until the next delimiter character (exclusive), if one exists, or the end of the string, if no later delimiter character exists.

The linked description doesn't explicitly mention the case of a token extending until the end of the string, as opposed to the standard, so it is incomplete in that respect.