How to read a binary file into a vector of unsigned chars

LihO picture LihO · Feb 28, 2013 · Viewed 75.5k times · Source

Lately I've been asked to write a function that reads the binary file into the std::vector<BYTE> where BYTE is an unsigned char. Quite quickly I came with something like this:

#include <fstream>
#include <vector>
typedef unsigned char BYTE;

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::streampos fileSize;
    std::ifstream file(filename, std::ios::binary);

    // get its size:
    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // read the data:
    std::vector<BYTE> fileData(fileSize);
    file.read((char*) &fileData[0], fileSize);
    return fileData;
}

which seems to be unnecessarily complicated and the explicit cast to char* that I was forced to use while calling file.read doesn't make me feel any better about it.


Another option is to use std::istreambuf_iterator:

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::ifstream file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<char>(file)),
                              std::istreambuf_iterator<char>());
}

which is pretty simple and short, but still I have to use the std::istreambuf_iterator<char> even when I'm reading into std::vector<unsigned char>.


The last option that seems to be perfectly straightforward is to use std::basic_ifstream<BYTE>, which kinda expresses it explicitly that "I want an input file stream and I want to use it to read BYTEs":

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::basic_ifstream<BYTE> file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)),
                              std::istreambuf_iterator<BYTE>());
}

but I'm not sure whether basic_ifstream is an appropriate choice in this case.

What is the best way of reading a binary file into the vector? I'd also like to know what's happening "behind the scene" and what are the possible problems I might encounter (apart from stream not being opened properly which might be avoided by simple is_open check).

Is there any good reason why one would prefer to use std::istreambuf_iterator here?
(the only advantage that I can see is simplicity)

Answer

jww picture jww · Feb 15, 2014

When testing for performance, I would include a test case for:

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::ifstream file(filename, std::ios::binary);

    // Stop eating new lines in binary mode!!!
    file.unsetf(std::ios::skipws);

    // get its size:
    std::streampos fileSize;

    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // reserve capacity
    std::vector<BYTE> vec;
    vec.reserve(fileSize);

    // read the data:
    vec.insert(vec.begin(),
               std::istream_iterator<BYTE>(file),
               std::istream_iterator<BYTE>());

    return vec;
}

My thinking is that the constructor of Method 1 touches the elements in the vector, and then the read touches each element again.

Method 2 and Method 3 look most promising, but could suffer one or more resize's. Hence the reason to reserve before reading or inserting.

I would also test with std::copy:

...
std::vector<byte> vec;
vec.reserve(fileSize);

std::copy(std::istream_iterator<BYTE>(file),
          std::istream_iterator<BYTE>(),
          std::back_inserter(vec));

In the end, I think the best solution will avoid operator >> from istream_iterator (and all the overhead and goodness from operator >> trying to interpret binary data). But I don't know what to use that allows you to directly copy the data into the vector.

Finally, my testing with binary data is showing ios::binary is not being honored. Hence the reason for noskipws from <iomanip>.