I wish to compute a simple checksum : just adding the values of all bytes.
The quickest way I found is:
checksum = sum([ord(c) for c in buf])
But for 13 Mb data buf, it takes 4.4 s : too long (in C, it takes 0.5 s)
If I use :
checksum = zlib.adler32(buf) & 0xffffffff
it takes 0.8 s, but the result is not the one I want.
So my question is: is there any function, or lib or C to include in python 2.6, to compute a simple checksum ?
Thanks by advance, Eric.
You could use sum(bytearray(buf))
:
In [1]: buf = b'a'*(13*(1<<20))
In [2]: %timeit sum(ord(c) for c in buf)
1 loops, best of 3: 1.25 s per loop
In [3]: %timeit sum(imap(ord, buf))
1 loops, best of 3: 564 ms per loop
In [4]: %timeit b=bytearray(buf); sum(b)
10 loops, best of 3: 101 ms per loop
Here's a C extension for Python written in Cython, sumbytes.pyx
file:
from libc.limits cimport ULLONG_MAX, UCHAR_MAX
def sumbytes(bytes buf not None):
cdef:
unsigned long long total = 0
unsigned char c
if len(buf) > (ULLONG_MAX // <size_t>UCHAR_MAX):
raise NotImplementedError #todo: implement for > 8 PiB available memory
for c in buf:
total += c
return total
sumbytes
is ~10 time faster than bytearray
variant:
name time ratio
sumbytes_sumbytes 12 msec 1.00
sumbytes_numpy 29.6 msec 2.48
sumbytes_bytearray 122 msec 10.19
To reproduce the time measurements, download reporttime.py
and run:
#!/usr/bin/env python
# compile on-the-fly
import pyximport; pyximport.install() # pip install cython
import numpy as np
from reporttime import get_functions_with_prefix, measure
from sumbytes import sumbytes # from sumbytes.pyx
def sumbytes_sumbytes(input):
return sumbytes(input)
def sumbytes_bytearray(input):
return sum(bytearray(input))
def sumbytes_numpy(input):
return np.frombuffer(input, 'uint8').sum() # @root's answer
def main():
funcs = get_functions_with_prefix('sumbytes_')
buf = ''.join(map(unichr, range(256))).encode('latin1') * (1 << 16)
measure(funcs, args=[buf])
main()