How to use istream with strings

hyperknot picture hyperknot · Jun 28, 2011 · Viewed 12.7k times · Source

I would like to read an file into a string. I am looking for different ways for how to do it efficiently.

Using a fixed size *char buffer

I have received an answer from Tony what creates a 16 kb buffer and reads into that buffer and appends the buffer till there is nothing more to read. I understand how it works and I found it very fast. What I don't understand is that in the comments of that answer it is said that this way copies everything twice. But as I understand it, it only happens in the memory, not from the disk, so it is almost unnoticable. Is it a problem that it copies from the buffer to the string in the memory?

Using istreambuf_iterator

The other answer I received uses istreambuf_iterator. The code looks beautiful and minimal, but it is extremely slow. I don't know why does it happen. Why are those iterators so slow?

Using memcpy()

For this question I received comments that I should use memcpy() as it is the fastest native method. But how can I use memcpy() with a string and an ifstream object? Isn't ifstream supposed to work with its own read function? Why does using memcpy() ruin portability? I am looking for a solution which is compatible with VS2010 as well as GCC. Why would memcpy() not work with those?

+ Any other efficient way possible?

What do you recommend, what shell I use, for small < 10 MB binary files?

(I did not want to split this question in parts, as I am more interested in the comparison between the different way how can I read an ifstream into a string)

Answer

Konrad Rudolph picture Konrad Rudolph · Jun 28, 2011

it only happens in the memory, not from the disk, so it is almost unnoticable

That is indeed correct. Still, a solution that doesn’t do that may be faster.

Why are those iterators so slow?

The code is slow not because of the iterators but because the string doesn’t know how much memory to allocate: the istreambuf_iterators can only be traversed once so the string is essentially forced to perform repeated concatenations with resulting memory reallocations, which are very slow.

My favourite one-liner, from another answer is streaming directly from the underlying buffer:

string str(static_cast<stringstream const&>(stringstream() << in.rdbuf()).str());

On recent platforms this will indeed pre-allocate the buffer. It will however still result in a redundant copy (from the stringstream to the final string).