Does large use of signals and slots affect application performance?

lucaboni picture lucaboni · May 31, 2012 · Viewed 13.5k times · Source

The question is just done for educational purpose:

Does the use of 30-50 or more pairs of signals and slots between two object (for example two threads) affect the application performance, runtime or response times?

Answer

First of all, you should probably not put any slots in QThreads. QThreads aren't really meant to be derived from other than by reimplementing the run method and private methods (not signals!).

A QThread is conceptually a thread controller, not a thread itself. In most cases you should deal with QObjects. Start a thread, then move the object instance to that thread. That's the only way you'll get slots working correctly in the thread. Moving the thread instance (it is QObject-derived!) to the thread is a hack and bad style. Don't do that in spite of uninformed forum posts telling otherwise.

As to the rest of your question: a signal-slot call does not have to locate anything nor validate much. The "location" and "validation" is done when the connection is established. The main steps done at the time of the call are:

  1. Locking a signal-slot mutex from a pool.

  2. Iterating through the connection list.

  3. Performing the calls using either direct or queued calls.

Common Cost

Any signal-slot call always starts as a direct call in the signal's implementation generated by moc. An array of pointers-to-arguments of the signal is constructed on the stack. The arguments are not copied.

The signal then calls QMetaObject::activate, where the connection list mutex is acquired, and the list of connected slots is iterated, placing the call for each slot.

Direct Connections

Not much is done there, the slot is called by either directly calling QObject::qt_static_metacall obtained at the time the connection was established, or QObject::qt_metacall if the QMetaObject::connect was used to setup the connection. The latter allows dynamic creation of signals and slots.

Queued Connections

The arguments have to marshalled and copied, since the call has to be stored in an event queue and the signal must return. This is done by allocating an array of pointers to copies, and copy-consting each argument on the heap. The code to do that is really no-frills plain old C.

The queuing of the call is done within queued_activate. This is where the copy-construction is done.

The overhead of a queued call is always at least one heap allocation of QMetaCallEvent. If the call has any arguments, then a pointers-to-arguments array is allocated, and an extra allocation is done for each argument. For a call with n arguments, the cost given as a C expression is (n ? 2+n : 1) allocations. A return value for blocking calls is counter as an argument. Arguably, this aspect of Qt could be optimized down to one allocation for everything, but in real life it'd only matter if you're calling trivial methods.

Benchmark Results

Even a direct (non-queued) signal-slot call has a measurable overhead, but you have to choose your battles. Ease of architecting the code vs. performance. You do measure performance of your final application and identify bottlenecks, do you? If you do, you're likely to see that in real-life applications, signal-slot overheads play no role.

The only time signal-slot mechanism has significant overhead is if you're calling trivial functions. Say, if you'd call the trivial slot in the code below. It's a complete, stand-alone benchmark, so feel free to run it and see for yourself. The results on my machine were:

Warming up the caches...
trivial direct call took 3ms
nonTrivial direct call took 376ms
trivial direct signal-slot call took 158ms, 5166% longer than direct call.
nonTrivial direct signal-slot call took 548ms, 45% longer than direct call.
trivial queued signal-slot call took 2474ms, 1465% longer than direct signal-slot and 82366% longer than direct call.
nonTrivial queued signal-slot call took 2474ms, 416% longer than direct signal-slot and 653% longer than direct call.

What should be noted, perhaps, is that concatenating strings is quite fast :)

Note that I'm doing the calls via a function pointer, this is to prevent the compiler from optimizing out the direct calls to the addition function.

//main.cpp
#include <cstdio>
#include <QCoreApplication>
#include <QObject>
#include <QTimer>
#include <QElapsedTimer>
#include <QTextStream>

static const int n = 1000000;

class Test : public QObject
{
    Q_OBJECT
public slots:
    void trivial(int*, int, int);
    void nonTrivial(QString*, const QString&, const QString&);
signals:
    void trivialSignalD(int*, int, int);
    void nonTrivialSignalD(QString*, const QString&, const QString &);
    void trivialSignalQ(int*, int, int);
    void nonTrivialSignalQ(QString*, const QString&, const QString &);
private slots:
    void run();
private:
    void benchmark(bool timed);
    void testTrivial(void (Test::*)(int*,int,int));
    void testNonTrivial(void (Test::*)(QString*,const QString&, const QString&));
public:
    Test();
};

Test::Test()
{
    connect(this, SIGNAL(trivialSignalD(int*,int,int)),
            SLOT(trivial(int*,int,int)), Qt::DirectConnection);
    connect(this, SIGNAL(nonTrivialSignalD(QString*,QString,QString)),
            SLOT(nonTrivial(QString*,QString,QString)), Qt::DirectConnection);
    connect(this, SIGNAL(trivialSignalQ(int*,int,int)),
            SLOT(trivial(int*,int,int)), Qt::QueuedConnection);
    connect(this, SIGNAL(nonTrivialSignalQ(QString*,QString,QString)),
            SLOT(nonTrivial(QString*,QString,QString)), Qt::QueuedConnection);
    QTimer::singleShot(100, this, SLOT(run()));
}

void Test::run()
{
    // warm up the caches
    benchmark(false);
    // do the benchmark
    benchmark(true);
}

void Test::trivial(int * c, int a, int b)
{
    *c = a + b;
}

void Test::nonTrivial(QString * c, const QString & a, const QString & b)
{
    *c = a + b;
}

void Test::testTrivial(void (Test::* method)(int*,int,int))
{
    static int c;
    int a = 1, b = 2;
    for (int i = 0; i < n; ++i) {
        (this->*method)(&c, a, b);
    }
}

void Test::testNonTrivial(void (Test::* method)(QString*, const QString&, const QString&))
{
    static QString c;
    QString a(500, 'a');
    QString b(500, 'b');
    for (int i = 0; i < n; ++i) {
        (this->*method)(&c, a, b);
    }
}

static int pct(int a, int b)
{
    return (100.0*a/b) - 100.0;
}

void Test::benchmark(bool timed)
{
    const QEventLoop::ProcessEventsFlags evFlags =
            QEventLoop::ExcludeUserInputEvents | QEventLoop::ExcludeSocketNotifiers;
    QTextStream out(stdout);
    QElapsedTimer timer;
    quint64 t, nt, td, ntd, ts, nts;

    if (!timed) out << "Warming up the caches..." << endl;

    timer.start();
    testTrivial(&Test::trivial);
    t = timer.elapsed();
    if (timed) out << "trivial direct call took " << t << "ms" << endl;

    timer.start();
    testNonTrivial(&Test::nonTrivial);
    nt = timer.elapsed();
    if (timed) out << "nonTrivial direct call took " << nt << "ms" << endl;

    QCoreApplication::processEvents(evFlags);

    timer.start();
    testTrivial(&Test::trivialSignalD);
    QCoreApplication::processEvents(evFlags);
    td = timer.elapsed();
    if (timed) {
        out << "trivial direct signal-slot call took " << td << "ms, "
               << pct(td, t) << "% longer than direct call." << endl;
    }

    timer.start();
    testNonTrivial(&Test::nonTrivialSignalD);
    QCoreApplication::processEvents(evFlags);
    ntd = timer.elapsed();
    if (timed) {
        out << "nonTrivial direct signal-slot call took " << ntd << "ms, "
               << pct(ntd, nt) << "% longer than direct call." << endl;
    }

    timer.start();
    testTrivial(&Test::trivialSignalQ);
    QCoreApplication::processEvents(evFlags);
    ts = timer.elapsed();
    if (timed) {
        out << "trivial queued signal-slot call took " << ts << "ms, "
               << pct(ts, td) << "% longer than direct signal-slot and "
               << pct(ts, t) << "% longer than direct call." << endl;
    }

    timer.start();
    testNonTrivial(&Test::nonTrivialSignalQ);
    QCoreApplication::processEvents(evFlags);
    nts = timer.elapsed();
    if (timed) {
        out << "nonTrivial queued signal-slot call took " << nts << "ms, "
               << pct(nts, ntd) << "% longer than direct signal-slot and "
               << pct(nts, nt) << "% longer than direct call." << endl;
    }
}

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    Test t;
    return a.exec();
}

#include "main.moc"