I just started using google performance tools (google-perftools
and libgoogle-perftools4
packages in ubuntu), I swear that I'm googling for around a day and I didn't find an answer!!
The problem is that I do not get the result for ALL of my functions with CPU profiling. This is my code:
#include "gperftools/profiler.h"
#include <iostream>
#include <math.h>
using namespace std;
void bar()
{
int a,b,c,d,j,k;
a=0;
int z=0;
b = 1000;
while(z < b)
{
while (a < b)
{
d = sin(a);
c = cos(a);
j = tan(a);
k = tan(a);
k = d * c + j *k;
a++;
}
a = 0;
z++;
}
}
void foo()
{
cout << "hey " << endl;
}
int main()
{
ProfilerStart("/home/mohammad/gperf/dump.txt");
int a = 1000;
while(a--){foo();}
bar();
ProfilerFlush();
ProfilerStop();
}
Compiled as g++ test.cc -lprofiler -o a.out
this is how I run the code:
CPUPROFILE=dump.txt ./a.out
I've also tried this:
CPUPROFILE_FREQUENCY=10000 LD_PRELOAD=/usr/local/lib/libprofiler.so.0.3.0 CPUPROFILE=dump.txt ./a.out
And this is what I get from google-pprof --text a.out dump.txt
:
Using local file ./a.out.
Using local file ./dump.txt.
Total: 22 samples
8 36.4% 36.4% 8 36.4% 00d8cb04
6 27.3% 63.6% 6 27.3% bar
3 13.6% 77.3% 3 13.6% __cos (inline)
2 9.1% 86.4% 2 9.1% 00d8cab4
1 4.5% 90.9% 1 4.5% 00d8cab6
1 4.5% 95.5% 1 4.5% 00d8cb06
1 4.5% 100.0% 1 4.5% __write_nocancel
0 0.0% 100.0% 3 13.6% __cos
But there is no information about the foo function!
my system information: ubuntu 12.04 g++ 4.6.3
Thats all!
TL;DR: foo
is to fast and small to get profiling events, run it 100 more times. Frequency setting was with typo, and pprof
will not sample more often than CONFIG_HZ (usually 250). It is better to switch to more modern Linux perf
profiler (tutorial from its authors, wikipedia).
Long version:
Your foo
function is just too short and simple - just call two functions. Compiled the test with g++ test.cc -lprofiler -o test.s -S -g
, with filtering of test.s
with c++filt
program to make c++ names readable:
foo():
.LFB972:
.loc 1 27 0
pushq %rbp
movq %rsp, %rbp
.loc 1 28 0
movl $.LC0, %esi
movl std::cout, %edi
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
movl std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&), %esi
movq %rax, %rdi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&))
.loc 1 29 0
popq %rbp
ret
.LFE972:
.size foo(), .-foo()
So, to see it in the profile you should run foo
for more times, changing int a = 1000;
in main to something much greater, like 10000 or better 100000 (as did I for the test).
Also you may fix the incorrect "CPUPROFILE_FREQUENC=10000
" to correct CPUPROFILE_FREQUENCY
(note the Y
). I should say that 10000 is too high setting for CPUPROFILE_FREQUENCY, because it usually may only generate 1000 or 250 events per second depending on kernel configuration CONFIG_HZ
(most 3.x kernels have 250, check grep CONFIG_HZ= /boot/config*
). And default setting for CPUPROFILE_FREQUENCY in pprof is 100.
I tested different values of CPUPROFILE_FREQUENCY: 100000, 10000, 1000, 250 with bash script on Ubuntu 14.04
for a in 100000 100000 10000 10000 1000 1000 300 300 250 250; do
echo -n "$a ";
CPUPROFILE_FREQUENCY=$a CPUPROFILE=dump$a.txt ./test >/dev/null;
done
And the results were the same 120-140 events and runtime of every ./test around 0.5 seconds, so cpuprofiler from google-perftools can't do more events per second for single thread, than CONFIG_HZ set in kernel (my has 250).
100000 PROFILE: interrupts/evictions/bytes = 124/1/6584
100000 PROFILE: interrupts/evictions/bytes = 134/0/7864
10000 PROFILE: interrupts/evictions/bytes = 125/0/7488
10000 PROFILE: interrupts/evictions/bytes = 123/0/6960
1000 PROFILE: interrupts/evictions/bytes = 134/0/6264
1000 PROFILE: interrupts/evictions/bytes = 125/2/7272
300 PROFILE: interrupts/evictions/bytes = 137/2/7984
300 PROFILE: interrupts/evictions/bytes = 126/0/7216
250 PROFILE: interrupts/evictions/bytes = 123/3/6680
250 PROFILE: interrupts/evictions/bytes = 137/2/7352
With original a=1000 foo
and cout's functions runs too fast to get any profiling event (even on 250 events/s) on them in every run, so you have no foo
, nor any Input/Output functions. In small amount of runs, the __write_nocancel
may got sampling event, and then foo
and I/O functions form libstdc++ will be reported (somewhere not in the very top, so use --text
option of pprof
or google-pprof
) with zero self event count, and non-zero child event count:
....
1 0.9% 99.1% 1 0.9% __write_nocancel
....
0 0.0% 100.0% 1 0.9% _IO_new_file_overflow
0 0.0% 100.0% 1 0.9% _IO_new_file_write
0 0.0% 100.0% 1 0.9% __GI__IO_putc
0 0.0% 100.0% 1 0.9% foo
0 0.0% 100.0% 1 0.9% new_do_write
0 0.0% 100.0% 1 0.9% std::endl
0 0.0% 100.0% 1 0.9% std::ostream::put
With a=100000
, foo is still too short and fast to get own events, but I/O functions got several. This is list I grepped from long --text
output:
34 24.6% 24.6% 34 24.6% __write_nocancel
1 0.7% 95.7% 35 25.4% __GI__IO_fflush
1 0.7% 96.4% 1 0.7% __GI__IO_putc
1 0.7% 97.8% 2 1.4% std::operator<<
1 0.7% 98.6% 36 26.1% std::ostream::flush
1 0.7% 99.3% 2 1.4% std::ostream::put
0 0.0% 100.0% 34 24.6% _IO_new_file_sync
0 0.0% 100.0% 34 24.6% _IO_new_file_write
0 0.0% 100.0% 40 29.0% foo
0 0.0% 100.0% 34 24.6% new_do_write
0 0.0% 100.0% 2 1.4% std::endl
Functions with zero own counters seen only thanks to pprof
capability of reading call chains (it knows who calls the functions which got sample, if frame info is not omitted).
I can also recommend more modern, more capable (both software and hardware events, up to 5 kHz frequency or more; both user-space and kernel profiling) and better supported profiler, the Linux perf
profiler (tutorial from its authors, wikipedia).
There are results from perf
with a=10000
:
$ perf record ./test3 >/dev/null
... skip some perf's spam about inaccessibility of kernel symbols
... note the 3 kHz frequency here VVVV
Lowering default frequency rate to 3250.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.078 MB perf.data (~3386 samples) ]
To see text report from perf.data
output file I'll use less (because perf report
by default starts interactive profile browser):
$ perf report |less
... skip some extra info about the machine, kernel, and perf starting command
# Samples: 1K of event 'cycles'
# Event count (approx.): 1155264208
# Overhead Command Shared Object Symbol
41.94% test3 libm-2.19.so [.] __tan_sse2
16.95% test3 libm-2.19.so [.] __sin_sse2
13.40% test3 libm-2.19.so [.] __cos_sse2
4.93% test3 test3 [.] bar()
2.90% test3 libc-2.19.so [.] __GI___libc_write
....
0.20% test3 test3 [.] foo()
Or perf report -n | less
to see raw event (sample) counters:
# Overhead Samples Command Shared Object
41.94% 663 test3 libm-2.19.so [.] __tan_sse2
16.95% 268 test3 libm-2.19.so [.] __sin_sse2
13.40% 212 test3 libm-2.19.so [.] __cos_sse2
4.93% 78 test3 test3 [.] bar()
2.90% 62 test3 libc-2.19.so [.] __GI___libc_write
....
0.20% 4 test3 test3 [.] foo()