def main():
for i in xrange(10**8):
pass
main()
This piece of code in Python runs in (Note: The timing is done with the time function in BASH in Linux.)
real 0m1.841s
user 0m1.828s
sys 0m0.012s
However, if the for loop isn't placed within a function,
for i in xrange(10**8):
pass
then it runs for a much longer time:
real 0m4.543s
user 0m4.524s
sys 0m0.012s
Why is this?
Inside a function, the bytecode is:
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_CONST 3 (100000000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (i)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
At the top level, the bytecode is:
1 0 SETUP_LOOP 20 (to 23)
3 LOAD_NAME 0 (xrange)
6 LOAD_CONST 3 (100000000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_NAME 1 (i)
2 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 2 (None)
26 RETURN_VALUE
The difference is that STORE_FAST
is faster (!) than STORE_NAME
. This is because in a function, i
is a local but at toplevel it is a global.
To examine bytecode, use the dis
module. I was able to disassemble the function directly, but to disassemble the toplevel code I had to use the compile
builtin.