Callgrind: Profile a specific part of my code

joetde picture joetde · Dec 3, 2012 · Viewed 7k times · Source

I'm trying to profile (with Callgrind) a specific part of my code by removing noise and computation that I don't care about. Here is an example of what I want to do:

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    //Method to be profiled with these data
    //Post operation on the data
}

My use-case is a regression test, I want to make sure that the method in question is still fast enough (something like less than 10% extra instructions since the last implementation). This is why I'd like to have the cleaner output form Callgrind. (I need a for loop in order to have a significant amount of data processed in order to have a good estimation of the behavior of the method I want to profile)

My first try was to change the code to:

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_START_INSTRUMENTATION;
    //Method to be profiled with these data
    CALLGRIND_STOP_INSTRUMENTATION;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

Adding the Callgrind macros to control the instrumentation. I also added the --instr-atstart=no options to be sure that I profile only the part of the code I want...

Unfortunately with this configuration when I start to launch my executable with callgrind, it never ends... It is not a question of slowness, because a full instrumentation run last less than one minute.

I also tried

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_TOGGLE_COLLECT;
    //Method to be profiled with these data
    CALLGRIND_TOGGLE_COLLECT;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

(or the --toggle-collect="myMethod" option) But Callgrind returned me a log without any call (KCachegrind is white as snow :( and says zero instructions...)

Did I use the macros/options correctly? Any idea of what I need to change in order to get the expected result?

Answer

joetde picture joetde · Dec 4, 2012

I finally managed to solve this issue... This was a config issue:

I kept the code

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_TOGGLE_COLLECT;
    //Method to be profiled with these data
    CALLGRIND_TOGGLE_COLLECT;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

But ran the callgrind with --collect-atstart=no (and without the --instr-atstart=no!!!) and it worked perfectly, in a reasonable time (~1min).

The issue with START/STOP instrumentation was that callgrind dumps a file (callgrind.out.#number) at each iteration (each STOP) thus it was really really slow... (after 5min I had only 5000 runs for a 300 000 iterations benchmark... unsuitable for a regression test).