I'm writing a small program in C, and I want to measure it's performance.
I want to see how much time do it run in the processor and how many cache hit+misses has it made. Information about context switches and memory usage would be nice to have too.
The program takes less than a second to execute.
I like the information of /proc/[pid]/stat, but I don't know how to see it after the program has died/been killed.
Any ideas?
EDIT: I think Valgrind adds a lot of overhead. That's why I wanted a simple tool, like /proc/[pid]/stat, that is always there.
Use perf:
perf stat ./yourapp
See the kernel wiki perf tutorial for details. This uses the hardware performance counters of your CPU, so the overhead is very small.
Example from the wiki:
perf stat -B dd if=/dev/zero of=/dev/null count=1000000
Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':
5,099 cache-misses # 0.005 M/sec (scaled from 66.58%)
235,384 cache-references # 0.246 M/sec (scaled from 66.56%)
9,281,660 branch-misses # 3.858 % (scaled from 33.50%)
240,609,766 branches # 251.559 M/sec (scaled from 33.66%)
1,403,561,257 instructions # 0.679 IPC (scaled from 50.23%)
2,066,201,729 cycles # 2160.227 M/sec (scaled from 66.67%)
217 page-faults # 0.000 M/sec
3 CPU-migrations # 0.000 M/sec
83 context-switches # 0.000 M/sec
956.474238 task-clock-msecs # 0.999 CPUs
0.957617512 seconds time elapsed
No need to load a kernel module manually, on a modern debian system (with the linux-base package) it should just work. With the perf record -a
/ perf report
combo you can also do full-system profiling. Any application or library that has debugging symbols will show up with details in the report.
For visualization flame graphs seem to work well. (Update 2020: the hotspot UI has flame graphs integrated.)