Why is Python 3 is considerably slower than Python 2?

gsb-eng picture gsb-eng · Jul 21, 2015 · Viewed 17.6k times · Source

I've been trying to understand why Python 3 is actually taking much time compared with Python 2 in certain situations, below are few cases I've verified from python 3.4 to python 2.7.

Note: I've gone through some of the questions like Why is there no xrange function in Python3? and loop in python3 much slower than python2 and Same code slower in Python3 as compared to Python2, but I feel that I didn't get the actual reason behind this issue.

I've tried this piece of code to show how it is making difference:

MAX_NUM = 3*10**7

# This is to make compatible with py3.4.
try:
    xrange
except:
    xrange = range


def foo():
    i = MAX_NUM
    while i> 0:
        i -= 1

def foo_for():
    for i in xrange(MAX_NUM):
        pass

When I've tried running this programme with py3.4 and py2.7 I've got below results.

Note: These stats came through a 64 bit machine with 2.6Ghz processor and calculated the time using time.time() in single loop.

Output : Python 3.4
-----------------
2.6392083168029785
0.9724123477935791

Output: Python 2.7
------------------
1.5131521225
0.475143909454

I really don't think that there has been changes applied to while or xrange from 2.7 to 3.4, I know range has been started acting as to xrange in py3.4 but as documentation says

range() now behaves like xrange() used to behave, except it works with values of arbitrary size. The latter no longer exists.

this means change from xrange to range is very much equal to a name change but working with arbitrary values.

I've verified disassembled byte code as well.

Below is the disassembled byte code for function foo():

Python 3.4:
--------------- 

 13           0 LOAD_GLOBAL              0 (MAX_NUM)
              3 STORE_FAST               0 (i)

 14           6 SETUP_LOOP              26 (to 35)
        >>    9 LOAD_FAST                0 (i)
             12 LOAD_CONST               1 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE       34

 15          21 LOAD_FAST                0 (i)
             24 LOAD_CONST               2 (1)
             27 INPLACE_SUBTRACT
             28 STORE_FAST               0 (i)
             31 JUMP_ABSOLUTE            9
        >>   34 POP_BLOCK
        >>   35 LOAD_CONST               0 (None)
             38 RETURN_VALUE

python 2.7
-------------

 13           0 LOAD_GLOBAL              0 (MAX_NUM)
              3 STORE_FAST               0 (i)

 14           6 SETUP_LOOP              26 (to 35)
        >>    9 LOAD_FAST                0 (i)
             12 LOAD_CONST               1 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE       34

 15          21 LOAD_FAST                0 (i)
             24 LOAD_CONST               2 (1)
             27 INPLACE_SUBTRACT    
             28 STORE_FAST               0 (i)
             31 JUMP_ABSOLUTE            9
        >>   34 POP_BLOCK           
        >>   35 LOAD_CONST               0 (None)
             38 RETURN_VALUE        

And below is the disassembled byte code for function foo_for():

Python: 3.4

 19           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (MAX_NUM)
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 GET_ITER
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

 20          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE


Python: 2.7
-------------

 19           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (MAX_NUM)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

 20          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        

If we compare both the byte codes they've produced the same disassembled byte code.

Now I'm wondering what change from 2.7 to 3.4 is really causing this huge change in execution time in the given piece of code.

Answer

Martijn Pieters picture Martijn Pieters · Jul 21, 2015

The difference is in the implementation of the int type. Python 3.x uses the arbitrary-sized integer type (long in 2.x) exclusively, while in Python 2.x for values up to sys.maxint a simpler int type is used that uses a simple C long under the hood.

Once you limit your loops to long integers, Python 3.x is faster:

>>> from timeit import timeit
>>> MAX_NUM = 3*10**3
>>> def bar():
...     i = MAX_NUM + sys.maxsize
...     while i > sys.maxsize:
...         i -= 1
... 

Python 2:

>>> timeit(bar, number=10000)
5.704327821731567

Python 3:

>>> timeit(bar, number=10000)
3.7299320790334605

I used sys.maxsize as sys.maxint was dropped from Python 3, but the integer value is basically the same.

The speed difference in Python 2 is thus limited to the first (2 ** 63) - 1 integers on 64-bit, (2 ** 31) - 1 integers on 32 bit systems.

Since you cannot use the long type with xrange() on Python 2, I did not include a comparison for that function.