vector vs map performance confusion

Cechner picture Cechner · Jul 3, 2014 · Viewed 7k times · Source

edit: I am specifically comparing std::vector's linear search operations to the std::map binary search operations because that is what Herb's claim seemed to relate to. I know that using a binary search will move the performance from O(N) to O(log N) but that would not be testing Herb's claim

Bjarne Stroustrup and Herb Sutter have both recently talked about how awesome std::vector is in situations one would expect std::list to be used, owing to the cost of cache misses during linked list traversal. (see http://channel9.msdn.com/Events/Build/2014/2-661 at the 48 minute mark)

Herb made a further statement however that operations on an ordered vector were even faster than std::map, (see http://i.imgur.com/zX07TZR.png taken from the 51:30 mark of the above channel9 video) which I found difficult to fathom. So I created a small test to demonstrate this and had difficulty reproducing these results: https://ideone.com/MN7DYK

This is the test code:

template <typename C>
void test(std::string name, std::vector<int> shuffledInputValues, C & c)
{
   // fill container 'c' with values from 'shuffledInputValues' then erase them all
   {
      std::cout << "testing " << name << "..." << std::endl;
      timer t;

      for (auto val : shuffledInputValues) insert(c, val);
      for (auto val : shuffledInputValues) remove(c, val);
  }
}

// output:
// testing vector...99.189ms
// testing deque...120.848ms
// testing set...4.077ms

Notice how the std::vector performs an order of magnitude slower than std::set . Of course this is the result I expected, but I am confused about the claim that Herb was trying to make.

What am I doing wrong? Or am I misunderstanding Herb's claim?

Notes on my test app:

  • it must use linear operations - the whole point of the exercise is to demonstrate CPU cache magic, these are the constraints Herb and Bjarne put on the exercise
  • I didn't try any tricky loop-unravelling for my vector iteration, but I believe that the iteration isn't affecting performance much anyway
  • I limited the loop to 10K items because ideone times out on larger sets, but increasing the size doesn't change the results

edit: see https://ideone.com/916fVd for a modified example that only compares the performance of lookups. Linear searching exhibits the same performance.

Answer

eerorika picture eerorika · Jul 3, 2014

I found the slides for easier reference (I can't see graphs, but I guess that might be because of proprietary file format). A relevant slide is number 39 which describes the problem that is being solved:

§ Generate N random integers and insert them into a sequence so that each is inserted in its proper position in the numerical order.

§ Remove elements one at a time by picking a random position in the sequence and removing the element there.

Now, it should be rather obvious that a linked list is not a good choice for this problem. Even though a list is much better than vector for inserting/removing in the beginning or in the middle, it's not good for inserting/removing in a random position because of the need for linear search. And linear search is much faster with vectors because of better cache efficiency.

Sutter suggests that a map (or a tree in general) would seem a natural choice for this algorithm because you get O(log n) search. And indeed, it does beat the vector quite easily for large N values in the insertion part.

Here comes the but. You need to remove the nth element (for random n). This is where I believe your code is cheating. You remove the elements in the order they were inserted, effectively using the input vector as a lookup table for finding value of an element at a "random" position so that you can search for it in O(log n). So you're really using a combination of set and a vector to solve the problem.

A regular binary search tree such as one used for std::map or std::set (which I assume Sutter used) doesn't have a fast algorithm for finding the nth element. Here's one which is claimed to be O(log n) on average and O(n) in the worst case. But std::map and std::set don't provide access to the underlying tree structure so for those you're stuck with in-order traversal (correct me if I'm wrong) which is a linear search again! I'm actually surprised that the map version is competitive with the vector one in Sutter's results.

For log(n) complexity, you need a structure such as Order statistic tree which is unfortunately not provided by standard library. There's GNU Policy-Based STL MAP as shown here.

Here is a quick test code I made for vector vs set vs ost (vs vector with binary search for good measure) https://ideone.com/DoqL4H Set is much slower while the other tree based structure is faster than the vector, which is not in line with Sutter's results.

order statistics tree: 15.958ms
vector binary search: 99.661ms
vector linear search: 282.032ms
set: 2004.04ms

(N = 20000, difference is only going to be greater in favor for the ost with bigger values)

In short, I came to same conclusion that Sutter's original results seem odd but for a slightly different reason. It seems to me that better asymptotic complexity wins lower constant factors this time.

Note that the problem description doesn't exclude the possibility of duplicate random values so using map / set instead of multimap / multiset is cheating a bit in the favor of the map / set, but I assume that to have only small significance when value domain is much larger than N. Also, pre-reserving the vector doesn't improve the performance significantly (around 1% when N = 20000).