Is it possible to effectively initialize bytearray with non-zero value?

Mikhail M. picture Mikhail M. · Nov 30, 2016 · Viewed 8.1k times · Source

I need to have huge boolean array. All values should be initialized as "True":

arr = [True] * (10 ** 9)

But created as above it takes too much memory. So I decided to use bytearray for that:

arr = bytearray(10 ** 9)  # initialized with zeroes

Is it possible to initialize bytearray with b'\x01' as effectively as it is initialized by b'\x00'?

I understand I could initialize bytearray with zeros and inverse my logic. But I'd prefer not to do that if possible.

timeit:

>>> from timeit import timeit
>>> def f1():
...   return bytearray(10**9)
... 
>>> def f2():
...   return bytearray(b'\x01'*(10**9))
... 
>>> timeit(f1, number=100)
14.117428014000325
>>> timeit(f2, number=100)
51.42543800899875

Answer

ShadowRanger picture ShadowRanger · Nov 30, 2016

Easy, use sequence multiplication:

arr = bytearray(b'\x01') * 10 ** 9

Same approach works for initializing with zeroes (bytearray(b'\x00') * 10 ** 9), and it's generally preferred, since passing integers to the bytes constructor has been a source of confusion before (people sometimes think they can make a single element bytes with the value of the integer).

You want to initialize the single element bytearray first, then multiply, rather than multiplying the bytes and passing it to the bytearray constructor, so you avoid doubling your peak memory requirements (and requiring reading from one huge array and writing to another, on top of the simple memset-like operation on a single array that any solution requires).

In my local tests, bytearray(b'\x01') * 10 ** 9 runs exactly as fast as bytearray(10 ** 9); both took ~164 ms per loop, vs. 434 ms for multiplying the bytes object, then passing it to bytearray constructor.