I can specify the maximum amount of characters for scanf
to read to a buffer
using this technique:
char buffer[64];
/* Read one line of text to buffer. */
scanf("%63[^\n]", buffer);
But what if we do not know the buffer length when we write the code? What if it is the parameter of a function?
void function(FILE *file, size_t n, char buffer[n])
{
/* ... */
fscanf(file, "%[^\n]", buffer); /* WHAT NOW? */
}
This code is vulnerable to buffer overflows as fscanf
does not know how big the buffer is.
I remember seeing this before and started to think that it was the solution to the problem:
fscanf(file, "%*[^\n]", n, buffer);
My first thought was that the *
in "%*[*^\n]"
meant that the maximum string size is passed an argument (in this case n
). This is the meaning of the *
in printf
.
When I checked the documentation for scanf
I found out that it means that scanf
should discard the result of [^\n]
.
This left me somewhat disappointed as I think that it would be a very useful feature to be able to pass the buffer size dynamically for scanf
.
Is there any way I can pass the buffer size to scanf
dynamically?
There isn't an analog to the printf()
format specifier *
in scanf()
.
In The Practice of Programming, Kernighan and Pike recommend using snprintf()
to create the format string:
size_t sz = 64;
char format[32];
snprintf(format, sizeof(format), "%%%zus", sz);
if (scanf(format, buffer) != 1) { …oops… }
Upgrading the example to a complete function:
int read_name(FILE *fp, char *buffer, size_t bufsiz)
{
char format[16];
snprintf(format, sizeof(format), "%%%zus", bufsiz - 1);
return fscanf(fp, format, buffer);
}
This emphasizes that the size in the format specification is one less than the size of the buffer (it is the number of non-null characters that can be stored without counting the terminating null). Note that this is in contrast to fgets()
where the size (an int
, incidentally; not a size_t
) is the size of the buffer, not one less. There are multiple ways of improving the function, but it shows the point. (You can replace the s
in the format with [^\n]
if that's what you want.)
Also, as Tim Čas noted in the comments, if you want (the rest of) a line of input, you're usually better off using fgets()
to read the line, but remember that it includes the newline in its output (whereas %63[^\n]
leaves the newline to be read by the next I/O operation). For more general scanning (for example, 2 or 3 strings), this technique may be better — especially if used with fgets()
or getline()
and then sscanf()
to parse the input.
Also, the TR 24731-1 'safe' functions, implemented by Microsoft (more or less) and standardized in Annex K of ISO/IEC 9899-2011 (the C11 standard), require a length explicitly:
if (scanf_s("%[^\n]", buffer, sizeof(buffer)) != 1)
...oops...
This avoids buffer overflows, but probably generates an error if the input is too long. The size could/should be specified in the format string as before:
if (scanf_s("%63[^\n]", buffer, sizeof(buffer)) != 1)
...oops...
if (scanf_s(format, buffer, sizeof(buffer)) != 1)
...oops...
Note that the warning (from some compilers under some sets of flags) about 'non-constant format string' has to be ignored or suppressed for code using the generated format string.