I have problems with my C program when I try to read / parse input.
Help?
This is a FAQ entry.
StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.
This is an attempt to cover a number of common mistakes comprehensively, so this specific family of questions can be answered simply by marking them as duplicates of this one:
scanf("%d", ...)
/ scanf("%c", ...)
fail?gets()
crash?The answer is marked as community wiki. Feel free to improve and (cautiously) extend.
A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0
') appended at the end of the stream.
A "text mode" stream may do a number of transformations, including (but not limited to):
'\n'
) to something else on output (e.g. "\r\n"
on Windows) and back to '\n'
on input;isprint(c)
is true), horizontal tabs, or new-lines.It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.
The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL
pointer. Always check whether fopen
returned a NULL
pointer, before attempting to read or write to the file.
When fopen
fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno
is a code number which can be compared against constants in errno.h
, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror()
or strerror()
. The error message should also include the filename you passed to fopen
; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char **argv)
{
if (argc < 2) {
fprintf(stderr, "usage: %s file\n", argv[0]);
return 1;
}
FILE *fp = fopen(argv[1], "rb");
if (!fp) {
// alternatively, just `perror(argv[1])`
fprintf(stderr, "cannot open %s: %s\n", argv[1], strerror(errno));
return 1;
}
// read from fp here
fclose(fp);
return 0;
}
Check any functions you call for success
This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.
These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.
EOF, or "why does the last line print twice"
The function feof() returns true
if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:
// BROKEN CODE
while (!feof(fp)) {
fgets(buffer, BUFFER_SIZE, fp);
printf("%s", buffer);
}
This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.
EOF only gets set when you attempt to read past the last character!
So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer
untouched, which then gets printed again.
Instead, check whether fgets
failed directly:
// GOOD CODE
while (fgets(buffer, BUFFER_SIZE, fp)) {
printf("%s", buffer);
}
Do not use gets(), ever
There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.
Do not use fflush() on stdin
or any other stream open for reading, ever
Many people expect fflush(stdin)
to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.
Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:
int c;
do c = getchar(); while (c != EOF && c != '\n');
Do not use *scanf() for potentially malformed input
Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.
But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)
Even then *scanf() can trip the unobservant:
[
, c
, and n
conversions). (See next paragraph.)When *scanf() does not work as expected
A frequent problem with *scanf() is when there is an unread whitespace (' '
, '\n'
, ...) in the input stream that the user did not account for.
Reading a number ("%d"
et al.), or a string ("%s"
), stops at any whitespace. And while most *scanf()
conversion specifiers skip leading whitespace in the input, [
, c
and n
do not. So the newline is still the first pending input character, making either %c
and %[
fail to match.
You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)
We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?
Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.
Read (part of) a line of input via fgets()
fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'
). If it did not all fit, you are looking at a partially-read line.
Parse the line in-memory
Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d
, i
, u
, o
, x
, a
, e
, f
, and g
.
But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.
Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.
And if all else fails, you have the whole line available to print a helpful error message for the user.
Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.
fclose(fp);