Specifically I'm interested in istream& getline ( istream& is, string& str );
. Is there an option to the ifstream constructor to tell it to convert all newline encodings to '\n' under the hood? I want to be able to call getline
and have it gracefully handle all line endings.
Update: To clarify, I want to be able to write code that compiles almost anywhere, and will take input from almost anywhere. Including the rare files that have '\r' without '\n'. Minimizing inconvenience for any users of the software.
It's easy to workaround the issue, but I'm still curious as to the right way, in the standard, to flexibly handle all text file formats.
getline
reads in a full line, up to a '\n', into a string. The '\n' is consumed from the stream, but getline doesn't include it in the string. That's fine so far, but there might be a '\r' just before the '\n' that gets included into the string.
There are three types of line endings seen in text files: '\n' is the conventional ending on Unix machines, '\r' was (I think) used on old Mac operating systems, and Windows uses a pair, '\r' following by '\n'.
The problem is that getline
leaves the '\r' on the end of the string.
ifstream f("a_text_file_of_unknown_origin");
string line;
getline(f, line);
if(!f.fail()) { // a non-empty line was read
// BUT, there might be an '\r' at the end now.
}
Edit Thanks to Neil for pointing out that f.good()
isn't what I wanted. !f.fail()
is what I want.
I can remove it manually myself (see edit of this question), which is easy for the Windows text files. But I'm worried that somebody will feed in a file containing only '\r'. In that case, I presume getline will consume the whole file, thinking that it is a single line!
.. and that's not even considering Unicode :-)
.. maybe Boost has a nice way to consume one line at a time from any text-file type?
Edit I'm using this, to handle the Windows files, but I still feel I shouldn't have to! And this won't fork for the '\r'-only files.
if(!line.empty() && *line.rbegin() == '\r') {
line.erase( line.length()-1, 1);
}
As Neil pointed out, "the C++ runtime should deal correctly with whatever the line ending convention is for your particular platform."
However, people do move text files between different platforms, so that is not good enough. Here is a function that handles all three line endings ("\r", "\n" and "\r\n"):
std::istream& safeGetline(std::istream& is, std::string& t)
{
t.clear();
// The characters in the stream are read one-by-one using a std::streambuf.
// That is faster than reading them one-by-one using the std::istream.
// Code that uses streambuf this way must be guarded by a sentry object.
// The sentry object performs various tasks,
// such as thread synchronization and updating the stream state.
std::istream::sentry se(is, true);
std::streambuf* sb = is.rdbuf();
for(;;) {
int c = sb->sbumpc();
switch (c) {
case '\n':
return is;
case '\r':
if(sb->sgetc() == '\n')
sb->sbumpc();
return is;
case std::streambuf::traits_type::eof():
// Also handle the case when the last line has no line ending
if(t.empty())
is.setstate(std::ios::eofbit);
return is;
default:
t += (char)c;
}
}
}
And here is a test program:
int main()
{
std::string path = ... // insert path to test file here
std::ifstream ifs(path.c_str());
if(!ifs) {
std::cout << "Failed to open the file." << std::endl;
return EXIT_FAILURE;
}
int n = 0;
std::string t;
while(!safeGetline(ifs, t).eof())
++n;
std::cout << "The file contains " << n << " lines." << std::endl;
return EXIT_SUCCESS;
}