I'm currently developing a very fast algorithm, with one part of it being an extremely fast scanner and statistics function. In this quest, i'm after any performance benefit. Therefore, I'm also interested in keeping the code "multi-thread" friendly.
Now for the question : i've noticed that putting some very frequently accessed variables and arrays into "Global", or "static local" (which does the same), there is a measurable performance benefit (in the range of +10%). I'm trying to understand why, and to find a solution about it, since i would prefer to avoid using these types of allocation. Note that i don't think the difference comes from "allocation", since allocating a few variables and small array on the stack is almost instantaneous. I believe the difference comes from "accessing" and "modifying" data.
In this search, i've found this old post from stackoverflow : C++ performance of global variables
But i'm very disappointed by the answers there. Very little explanation, mostly ranting about "you should not do that" (hey, that's not the question !) and very rough statements like 'it doesn't affect performance', which is obviously incorrect, since i'm measuring it with precise benchmark tools.
As said above, i'm looking for an explanation, and, if it exists, a solution to this issue. So far, i've got the feeling that calculating the memory address of a local (dynamic) variable costs a bit more than a global (or local static). Maybe something like an ADD operation difference. But that doesn't help finding a solution...
It really depends on your compiler, platform, and other details. However, I can describe one scenario where global variables are faster.
In many cases, a global variable is at a fixed offset. This allows the generated instructions to simply use that address directly. (Something along the lines of MOV AX,[MyVar]
.)
However, if you have a variable that's relative to the current stack pointer or a member of a class or array, some math is required to take the address of the array and determine the address of the actual variable.
Obviously, if you need to place some sort of mutex on your global variable in order to keep it thread-safe, then you'll almost certainly more than lose any performance gain.