What is better: reserve vector capacity, preallocate to size or push back in loop?

daljaz picture daljaz · Aug 25, 2015 · Viewed 14.2k times · Source

I have a function that takes a pointer to char array and segment size as input arguments and calls another function that requires a std::array<std::string>. The idea is that the input char array is "sectioned" into equal parts, and string array formed.

The input char array format is several smaller arrays (or strings) of determined size, concatenated togeather. These are not assumed zero-terminated, although they might be. Examples for segment size 5 and number of elements 10:

char k[] = "1234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\000";
char m[] = "1234\00067890987654321\000234567809876\0005432\000\000\0003456789098";
char n[] = "12345678909876543211234567890987654321123456789098";

Length of all char arrays is 51 (segment * elements + 1). My goal is to make the function use resources efficiently, most importantly execution time.

Since there are many ways to skin a cat, I have two (or three) ways to tackle this, and the question is, which is "better"? By that I mean faster and less resource-wasteful. I am not a professional, so be patient with me.

Here, values is preallocated and then each string assigned a value.

void myfnc_1(void *a_src, uint32_t a_segment) {
    // a_segment = 5 for example
    size_t nSize = GetSize(); // another method, gets 10
    std::vector<std::string> values(nSize);
    char* v = a_src; // take k, n or m for example

    for (size_t i = 0; i < nSize; ++i) {
        values.at(i).assign(v, a_segment);
        v += a_segment;
    }
}

Here, the vector is not allocated, but each iteration a new string is added.

void myfnc_1(void *a_src, uint32_t a_segment) {
    size_t nSize = GetSize();
    std::vector<std::string> values();
    char* v = a_src;

    for (size_t i = 0; i < nSize; ++i) {
        values.push_back("");
        values.back().assign(v, a_segment);
        v += a_segment;
    }
}

There might be a third way, that's better. I'm not so experienced with vectors so I don't know exactly. Do the segment length and number of elements make a difference if they are usually large (5, 10) or small (100, 10000)?

First post, big fan :)

Answer

Andrew picture Andrew · Aug 25, 2015

Adding elements to a vector

There are several ways to add data to a vector:

  • create an empty vector, push_back() elements into it.
  • create an empty vector, allocate some capacity with reserve(), then push_back() elements into it.
  • create a vector of n elements, use indexing and copy-assignment.
  • create an empty vector, emplace_back() elements into it.
  • create an empty vector, allocate some capacity with reserve(), then emplace_back() elements into it.

There are other ways, e.g. creating the container with a pair of iterators, or filling it via standard library algorithms. I won't consider these here.

General considerations

The following two considerations are important for the following analysis:

  • Avoid (re)allocations: Memory allocation is slow. reallocation often involves copying everything that's already in the container to the new location.
  • Avoid unnecessary work: It's better to construct the element you want than to default-construct, then assign. It's better to construct an element where you want it than to construct it elsewhere, then copy.

Other factors will also affect the efficiency of the chosen solution, but these are significant factors that we have direct control over. Other factors may become obvious through profiling your code.

push_back()

Each push_back() copy-constructs an element in the vector from the argument to the push_back() call. If the vector size() == capacity(), a reallocation will be performed. This will usually (but may not always) double the capacity, and may result in copying all of the existing elements into new storage.

push_back() with pre-allocation

Using reserve() allocates enough memory for the elements before we start. It's always worth doing this if you know (or have a reasonable guess for) the number of elements. If you're guessing, over-estimates are better than under-estimates.

The push_back() call will still use the copy constructor of the element type, but there shouldn't be any allocations, because the space is already provided. You just pay the cost of a single allocation during the reserve() call. If you do run over the existing capacity, push_back() will reallocate, often doubling the capacity. This is why an over-estimate for the size is better; you're less likely to get a reallocation, whereas with an under-estimate, not only will you likely reallocate, but you'll waste memory allocating almost twice as much as you needed!

Note that the "doubling capacity" behaviour is not specified by the standard, but it is a common implementation, designed to reduce the reallocation frequency when using push_back() for data sets of unknown size.

indexing and element assignment

Here, we create a vector of the correct number of default-constructed elements, and then use the copy-assignment operator to replace them with the elements we want. This has only one allocation, but can be slow if copy-assignment does anything complicated. This doesn't really work for data sets of unknown (or only guessed) size; the element indexing is safe only if you know the index will never exceed size(), and you have to resort to push_back() or resizing if you need more. This isn't a good general solution, but it can work sometimes.

emplace_back()

emplace_back() constructs the element in-place with the arguments to the emplace_back() call. This can often be faster than the equivalent push_back() (but not always). It still allocates in the same pattern as push_back(), reserving some capacity, filling it, then reallocating when more is needed. Much of the same argument applies, but you can make some gains from the construction method.

emplace_back() with pre-allocation

This should be your go-to strategy for C++11 or later codebases. You gain the emplace_back() efficiency (where possible) and avoid repeated allocations. Of the mechanisms listed, this would be expected to be the fastest in most cases.

A note on efficiency

Efficiency can be measured in several ways. Execution time is a common measure, but not always the most relevant. General advice about which strategy to use is based on experience and essentially "averages" the various effects to provide some reasonable statements about what to do first. As always, if any kind of efficiency is critical for your application, the only way to be sure you're optimising the right place is to profile your code, make changes, and then profile it again to demonstrate that the changes had the desired effect. Different data types, hardware, I/O requirements, etc. can all affect this kind of timing, and you will never know how those effects combine in your particular application until you profile it.

Example

Live example: http://coliru.stacked-crooked.com/a/83d23c2d0dcee2ff

In this example, I fill a vector with 10,000 strings using each of the approaches listed above. I time each one and print the results.

This is similar to your question, but not identical; your results will differ!

Note that the emplace_back() with reserve() is the fastest, but the indexing & assignment is also quick here. This is likely because std::string has an efficient swap(), and its default constructor doesn't do much. The other approaches are an order of magnitude slower.

#include <chrono>
#include <iostream>
#include <vector>

using Clock = std::chrono::high_resolution_clock;
using time_point = std::chrono::time_point<Clock>;

std::vector<std::string> strings = {"one", "two", "three", "four", "five"};

std::chrono::duration<double> vector_push_back(const size_t n) {
    time_point start, end;
    start = Clock::now();

    std::vector<std::string> v;
    for (size_t i = 0; i < n; ++i) {
        v.push_back(strings[i % strings.size()]);
    }

    end = Clock::now();
    return end - start;
}

std::chrono::duration<double> vector_push_back_with_reserve(const size_t n) {
    time_point start, end;
    start = Clock::now();

    std::vector<std::string> v;
    v.reserve(n);
    for (size_t i = 0; i < n; ++i) {
        v.push_back(strings[i % strings.size()]);
    }

    end = Clock::now();
    return end - start;
}

std::chrono::duration<double> vector_element_assignment(const size_t n) {
    time_point start, end;
    start = Clock::now();

    std::vector<std::string> v(n);
    for (size_t i = 0; i < n; ++i) {
        v[i] = strings[i % strings.size()];
    }

    end = Clock::now();
    return end - start;
}

std::chrono::duration<double> vector_emplace_back(const size_t n) {
    time_point start, end;
    start = Clock::now();

    std::vector<std::string> v;
    for (size_t i = 0; i < n; ++i) {
        v.emplace_back(strings[i % strings.size()]);
    }

    end = Clock::now();
    return end - start;
}

std::chrono::duration<double> vector_emplace_back_with_reserve(const size_t n) {
    time_point start, end;
    start = Clock::now();

    std::vector<std::string> v;
    v.reserve(n);
    for (size_t i = 0; i < n; ++i) {
        v.emplace_back(strings[i % strings.size()]);
    }

    end = Clock::now();
    return end - start;
}

int main() {
    const size_t n = 10000;
    std::cout << "vector push_back: " << vector_push_back(n).count() << "\n";
    std::cout << "vector push_back with reserve: " << vector_push_back(n).count() << "\n";
    std::cout << "vector element assignment: " << vector_element_assignment(n).count() << "\n";
    std::cout << "vector emplace_back: " << vector_emplace_back(n).count() << "\n";
    std::cout << "vector emplace_back with reserve: " << vector_emplace_back_with_reserve(n).count() << "\n";
}

Results:

vector push_back: 0.00205563
vector push_back with reserve: 0.00152464
vector element assignment: 0.000610934
vector emplace_back: 0.00125141
vector emplace_back with reserve: 0.000545451

Conclusions

For most new code, using reserve() and emplace_back() (or push_back() for older code) should give you a good first-approximation for efficiency. If it isn't enough, profile it and find out where the bottleneck is. It probably won't be where you think it is.