I have a CSV file containing data such as
value;name;test;etc
which I'm trying to split by using strtok(string, ";")
. However, this file can contain zero-length data, like this:
value;;test;etc
which strtok()
skips. Is there a way I can avoid strtok
from skipping zero-length data like this?
A possible alternative is to use the BSD function strsep()
instead of strtok()
, if available.
From the man page:
The
strsep()
function is intended as a replacement for thestrtok()
function. While thestrtok()
function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990 ("ISO C90")) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. Thestrsep()
function first appeared in 4.4BSD.
A simple example (also copied from that man page):
char *token, *string, *tofree;
tofree = string = strdup("value;;test;etc");
while ((token = strsep(&string, ";")) != NULL)
printf("token=%s\n", token);
free(tofree);
Output:
token=value token= token=test token=etc
so empty fields are handled correctly.
Of course, as others already said, none of these simple tokenizer functions handles delimiter inside quotation marks correctly, so if that is an issue, you should use a proper CSV parsing library.