std::getline deal with \n, \r and \r\n

user3353819 picture user3353819 · Nov 20, 2014 · Viewed 13.2k times · Source

Specifically I'm interested in istream& getline ( istream& is, string& str );. Is there an option to the ifstream constructor to tell it to convert all newline encodings to '\n' under the hood? I want to be able to call getline and have it gracefully handle all line endings.

Update: To clarify, I want to be able to write code that compiles almost anywhere, and will take input from almost anywhere. Including the rare files that have '\r' without '\n'. Minimizing inconvenience for any users of the software.

It's easy to workaround the issue, but I'm still curious as to the right way, in the standard, to flexibly handle all text file formats.

getline reads in a full line, up to a '\n', into a string. The '\n' is consumed from the stream, but getline doesn't include it in the string. That's fine so far, but there might be a '\r' just before the '\n' that gets included into the string.

There are three types of line endings seen in text files: '\n' is the conventional ending on Unix machines, '\r' was (I think) used on old Mac operating systems, and Windows uses a pair, '\r' following by '\n'.

The problem is that getline leaves the '\r' on the end of the string.

ifstream f("a_text_file_of_unknown_origin");
string line;
getline(f, line);
if(!f.fail()) { // a non-empty line was read
   // BUT, there might be an '\r' at the end now.
}

Edit Thanks to Neil for pointing out that f.good() isn't what I wanted. !f.fail() is what I want.

I can remove it manually myself (see edit of this question), which is easy for the Windows text files. But I'm worried that somebody will feed in a file containing only '\r'. In that case, I presume getline will consume the whole file, thinking that it is a single line!

.. and that's not even considering Unicode :-)

.. maybe Boost has a nice way to consume one line at a time from any text-file type?

Edit I'm using this, to handle the Windows files, but I still feel I shouldn't have to! And this won't fork for the '\r'-only files.

if(!line.empty() && *line.rbegin() == '\r') {
    line.erase( line.length()-1, 1);
}

Answer