boost::dynamic_bitset slower than std::bitset unless std::bitset is reset

nick picture nick · Aug 19, 2014 · Viewed 11.7k times · Source

I recently came across the bitset templates and would really like to use them in my current project. Reading on, I see that the std::bitset template must have a size determined at compile time. Many suggest using the boost::dynamic_bitset to alleviate this requirement.

To compare the two, I decided to do a speed comparison of set, flip, and count methods.

The results are quite odd...and I'm wondering if someone can shed some light on it for me.

The code is at the end of the post, but I'll explain what I'm doing here. I have one std::bitset object (call it bs) and one boost::dynamic_bitset object (call it dynbs). Each has n=1000000 bits. For a given method above, call the method on each of the n bits sequentially and repeat this R=10000 times.

Using the std::chrono library, here are the timings for each in nanoseconds:

set
        bitset:              267 nsecs
    dyn bitset:      18603174546 nsecs

flip
        bitset:               73 nsecs
    dyn bitset:      18842352867 nsecs

count
        bitset:               77 nsecs
    dyn bitset:               51 nsecs

The boost::dynamic_bitset seems to be much slower for set and flip.

To make it more interesting, if the method reset is called on the two objects before running these tests, then the timings are comparable. Here they are:

set
        bitset:      19397779399 nsecs
    dyn bitset:      18472863864 nsecs

flip
        bitset:      18599248629 nsecs
    dyn bitset:      18376267939 nsecs

count
        bitset:               68 nsecs
    dyn bitset:               61 nsecs

Now, both containers claim to initialize all bits to 0, thus a call to reset shouldn't change any of the bits. Dumping the output of none before and after reset does confirm this.

So after all that, I have two questions:

1) Why is the boost::dynamic_bitset so much slower than the std::bitset when calling set and flip?

2) Why does calling reset have a huge negative effect on the speed of std::bitset?

Here is my code:

#include <iostream>
#include <iomanip>
#include <bitset>
#include <boost/dynamic_bitset.hpp>
#include <vector>
#include <chrono>
#include <ctime>

using namespace std;
using namespace chrono;
using namespace boost;

int main(){
  const unsigned int n=1000000;
  bitset< n > bs;
  dynamic_bitset< > dynbs(n);
  // bs.reset();
  // dynbs.reset();

  unsigned int i,r,R=10000;
  high_resolution_clock::time_point tick,tock;


  ////////////////////////////////////////////////////////////
  // Method: set

  std::cout << "set" << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      bs.set(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs" 
       << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      dynbs.set(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "dyn bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl << std::endl;


  ////////////////////////////////////////////////////////////
  // Method: flip

  std::cout << "flip" << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      bs.flip(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      dynbs.flip(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "dyn bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl << std::endl;


  ////////////////////////////////////////////////////////////
  // Method: count

  std::cout << "count" << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      bs.count();

  tock=high_resolution_clock::now();
  cout << setw(16) << "bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      dynbs.count();

  tock=high_resolution_clock::now();
  cout << setw(16) << "dyn bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl;


  return 0;
}

I compiled it with

g++ -O3 -std=c++11 bitset.cpp -o bitset

where bitset.cpp is the code inserted above.

Answer

T.C. picture T.C. · Aug 21, 2014

1) Why is the boost::dynamic_bitset so much slower than the std::bitset when calling set and flip?

Since std::bitset does not use dynamic allocation, and your bitset is a local variable, the compiler can easily determine that all the set's and flips and counts have no externally visible effect. As a result, it optimizes them all away, and your code basically ends up being a bunch of timing and printing calls.

Note that in the above assembly it doesn't even allocate stack space for the bitset. The whole thing basically vanished without a trace.

boost::dynamic_bitset allocates its buffer dynamically with new, which ends up calling ::operator new(), which can be an arbitrary overloaded version defined in a different translation unit. So it's very difficult for the compiler to prove that changes to the buffer returned are not visible externally.

Your count loops for both std::bitset and boost::dynamic_bitset are optimized away because the compiler can easily see that count() doesn't change anything in the bitsets and you don't use the return value.

2) Why does calling reset have a huge negative effect on the speed of std::bitset?

I checked the source code of reset in GCC, and it calls a compiler intrinsic __builtin_memset, passing it a pointer to the buffer. When you pass a pointer to a stack variable to an external function, then the compiler is much more constrained in what it can remove, since changes in the variable may now be externally observable (for example, the function called could have stored a copy of the pointer somewhere and something can peek into it later).