For Python 3.2: Why does adding bytes to a bytes object take longer as more bytes are added?

user1660861 picture user1660861 · Sep 10, 2012 · Viewed 8.8k times · Source

I recently started learning Python (my first programming language since using GW BASIC as a kid). I’ve noticed that when adding bytes to a bytes object, each byte takes more time to add than the last; and by contrast, when adding integers to a list object, each integer takes the same amount of time to add as the last. The following program illustrates.

import time
import struct
time.clock() # for Windows

def time_list():
    print("adding 9,999,999 0s to one list 9 times:")
    a = []
    for i in range(9):
        start_time = time.clock()
        for j in range(9999999):
            a += [0]
        end_time = time.clock()
        print("loop %d took %f seconds" %(i, end_time - start_time))
    print()

def time_bytes_object():
    print("adding 99,999 pad bytes to a bytes object 9 times:")
    a = bytes()
    for i in range(9):
        start_time = time.clock()
        for j in range(99999):
            a += struct.pack('<B', 0)
        end_time = time.clock()
        print("loop %d took %f seconds" %(i, end_time - start_time))
    print()

time_list()
time_bytes_object()

What is it about the bytes object (or the struct.pack function) that makes adding bytes take increasing amounts of time? Or is there a faster way to collect a bunch of bytes than the way used in my example?

Thanks for any help,

Victor

Answer

nneonneo picture nneonneo · Sep 10, 2012

Byte strings (and Unicode strings) in Python are immutable, whereas lists are mutable.

What this means is that every append (+=) done on a byte string must make a copy of that string; the original is not modified (though it will be garbage-collected later). In contrast, the append method of list (also used by +=) will actually modify the list.

What you want is the bytearray type, which is a mutable type functioning much like a list of bytes. Appending to a bytearray takes (amortized) constant time, and it is easily converted to and from a byte string.