Is the fastcall calling convention really faster than other calling conventions, such as cdecl? Are there any benchmarks out there that show how performance is affected by calling convention?
It depends on the platform. For a Xenon PowerPC, for example, it can be an order of magnitude difference due to a load-hit-store issue with passing data on the stack. I empirically timed the overhead of a cdecl
function at about 45 cycles compared to ~4 for a fastcall
.
For an out-of-order x86 (Intel and AMD), the impact may be much less, because the registers are all shadowed and renamed anyway.
The answer really is that you need to benchmark it yourself on the particular platform you care about.