Intel TBB vs Boost

David picture David · Aug 20, 2011 · Viewed 14.6k times · Source

I my new application I have flexibility to decide the use of library for multi-threading. So far I was using pthread. Now want to explore cross platform library. I zero in on TBB and Boost. I didn't understand what is the benefit of TBB over Boost. I am trying to find out advantage of TBB over Boost: TBB Excerpts for wiki "Instead the library abstracts access to the multiple processors by allowing the operations to be treated as "tasks", which are allocated to individual cores dynamically by the library's run-time engine, and by automating efficient use of the cache. A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms,"

but do threading library even need to worry about the allocation of threads to cores. Isn't this a job of operating system? So what is the real Benifit of using TBB over Boost?

Answer

Alexey Kukanov picture Alexey Kukanov · Aug 20, 2011

but do threading library even need to worry about the allocation of threads to cores. isn't this a job of operating system? So what is the real Benifit of using TBB over Boost?

You are right, a threading library usually should not care about mapping threads to cores. And TBB does not. TBB operates with tasks, not threads. TBB's scheduler utilizes all cores by allocating a pool of threads and letting it dynamically select which tasks to run. This is the main advantage over Boost, with which you will need to map available work to threads manually. And then TBB offers high-level constructs such as parallel_for, parallel_pipeline, etc. that can be used to express most common parallel patterns, and hide all manipulation with tasks.

For example, let's take a piece of code that calculates points of Mandelbrot fractal (taken from http://warp.povusers.org/Mandelbrot/, variable initialization omitted):

for(unsigned y=0; y<ImageHeight; ++y)
{
    double c_im = MaxIm - y*Im_factor;
    for(unsigned x=0; x<ImageWidth; ++x)
    {
        double c_re = MinRe + x*Re_factor;

        double Z_re = c_re, Z_im = c_im;
        bool isInside = true;
        for(unsigned n=0; n<MaxIterations; ++n)
        {
            double Z_re2 = Z_re*Z_re, Z_im2 = Z_im*Z_im;
            if(Z_re2 + Z_im2 > 4)
            {
                isInside = false;
                break;
            }
            Z_im = 2*Z_re*Z_im + c_im;
            Z_re = Z_re2 - Z_im2 + c_re;
        }
        if(isInside) { putpixel(x, y); }
    }
}

Now to make it parallel with TBB, all you need is to convert the outermost loop into tbb::parallel_for (I use a C++11 lambda for brevity):

tbb::parallel_for(0, ImageHeight, [=](unsigned y)
{
    // the rest of code is exactly the same
    double c_im = MaxIm - y*Im_factor;
    for(unsigned x=0; x<ImageWidth; ++x)
    {
        ...
        // if putpixel() is not thread safe, a lock might be needed
        if(isInside) { putpixel(x, y); }
    }
});

TBB will automatically distribute all loop iterations over available cores (and you don't bother how many) and dynamically balance the load so that if some thread has more work to do, other threads don't just wait for it but help, maximizing CPU utilization. Try implementing it with raw threads, and you will feel the difference :)