std deque is surprisingly slow

icl7126 picture icl7126 · Oct 2, 2012 · Viewed 7.7k times · Source

I just find out that standard std deque is really slow when comparing with my "home-made" version of stack that use pre-allocated array.
This is code of my stack:

template <class T>
class FastStack
{    
public:
    T* st;
    int allocationSize;
    int lastIndex;

public:
    FastStack(int stackSize);
    FastStack();
    ~FastStack();

    inline void resize(int newSize);
    inline void push(T x);
    inline void pop();
    inline T getAndRemove();
    inline T getLast();
    inline void clear();
};

template <class T>
FastStack<T>::FastStack()
{
    lastIndex = -1;
    st = NULL;
}

template <class T>
FastStack<T>::FastStack(int stackSize)
{
    st = NULL;
    this->allocationSize = stackSize;
    st = new T[stackSize];
    lastIndex = -1;
}

template <class T>
FastStack<T>::~FastStack()
{
    delete [] st;
}

template <class T>
void FastStack<T>::clear()
{
    lastIndex = -1;
}

template <class T>
T FastStack<T>::getLast()
{
    return st[lastIndex];
}

template <class T>
T FastStack<T>::getAndRemove()
{
    return st[lastIndex--];
}

template <class T>
void FastStack<T>::pop()
{
    --lastIndex;
}

template <class T>
void FastStack<T>::push(T x)
{
    st[++lastIndex] = x;
}

template <class T>
void FastStack<T>::resize(int newSize)
{
    if (st != NULL)
        delete [] st;
    st = new T[newSize];
}

.
I run this simple benchmark for deque:

std::deque<int> aStack;
int x;

HRTimer timer;
timer.Start();

for (int i = 0; i < 2000000000; i++)
{
    aStack.push_back(i);
    x = aStack.back();
    if (i % 100 == 0 && i != 0)
        for (int j = 0; j < 100; j++)
            aStack.pop_back();
}

double totalTime = timer.Stop();
stringstream ss;
ss << "std::deque " << totalTime;
log(ss.str());

.
Using std stack with vector as container (as 'Michael Kohne' suggested)

std::stack<int, std::vector<int>> bStack;
int x;

HRTimer timer;
timer.Start();

for (int i = 0; i < 2000000000; i++)
{
    bStack.push(i);
    x = bStack.top();
    if (i % 100 == 0 && i != 0)
        for (int j = 0; j < 100; j++)
            bStack.pop();
}

double totalTime = timer.Stop();
stringstream ss;
ss << "std::stack " << totalTime;
log(ss.str());

.
And the same one for my FastStack:

FastStack<int> fstack(200);
int x;

HRTimer timer;
timer.Start();

for (int i = 0; i < 2000000000; i++)
{
    fstack.push(i);
    x = fstack.getLast();
    if (i % 100 == 0 && i != 0)
        for (int j = 0; j < 100; j++)
            fstack.pop();
}

double totalTime = timer.Stop();
stringstream ss;
ss << "FastStack " << totalTime;
log(ss.str());

.
The results after 4 runs are as follows:
deque 15.529
deque 15.3756
deque 15.429
deque 15.4778

stack 6.19099
stack 6.1834
stack 6.19315
stack 6.19841

FastStack 3.01085
FastStack 2.9934
FastStack 3.02536
FastStack 3.00937

The results are in seconds and I was running the code on Intel i5 3570k (on default clock). I used VS2010 compiler with all optimizations that are available. I know that my FastStack has a lot of limitations but there are plenty situations, where it could be used and when it can give nice boost! (I used it in one project where I get 2x speed up comparing to std::queue).
So now my question is:
Are there any other "inhibitors" in C++ that everybody use but no one knows about them?
EDIT: I don't want to be offensive, I'm just curious if you know some unknown "speedups" like this.

Answer

Michael Kohne picture Michael Kohne · Oct 2, 2012

You are comparing apples and oranges. The dequeue is DOUBLE-ENDED, which requires it to be quite a bit different internally than the simple stack you've implemented. Try running your benchmark against an std::stack<int, std::vector<int> > instead and see how you do. std::stack is a container adapter, and if you use a vector as the underlying container, it should be nearly as fast as your home-grown implementation.

The one down side is that the std::stack implementation doesn't have a way for you to pre-set the size, so in situations where you KNOW what the max size needs to be, it'll be a little slower initially as it has to re-allocate several times. After that, it will be pretty much the same.