clearing a small integer array: memset vs. for loop

Claudiu picture Claudiu · Jul 15, 2009 · Viewed 63.5k times · Source

There are two ways to zero out an integer/float array:

memset(array, 0, sizeof(int)*arraysize);

or:

for (int i=0; i <arraysize; ++i)
    array[i]=0;

obviously, memset is faster for large arraysize. However, at what point is the overhead of memset actually larger than the overhead of the for loop? For example, for an array of size 5 - which would be best? The first, the 2nd, or maybe even the un-rolled version:

array[0] = 0;
array[1] = 0;
array[2] = 0;
array[3] = 0;
array[4] = 0;

Answer

Michael Burr picture Michael Burr · Jul 15, 2009

In all likelihood, memset() will be inlined by your compiler (most compilers treat it as an 'intrinsic', which basically means it's inlined, except maybe at the lowest optimizations or unless explicitly disabled).

For example, here are some release notes from GCC 4.3:

Code generation of block move (memcpy) and block set (memset) was rewritten. GCC can now pick the best algorithm (loop, unrolled loop, instruction with rep prefix or a library call) based on the size of the block being copied and the CPU being optimized for. A new option -minline-stringops-dynamically has been added. With this option string operations of unknown size are expanded such that small blocks are copied by in-line code, while for large blocks a library call is used. This results in faster code than -minline-all-stringops when the library implementation is capable of using cache hierarchy hints. The heuristic choosing the particular algorithm can be overwritten via -mstringop-strategy. Newly also memset of values different from 0 is inlined.

It might be possible for the compiler to do something similar with the alternative examples you gave, but I'd bet it's less likely to.

And it's grep-able and more immediately obvious at a glance what the intent is to boot (not that the loop is particularly difficult to grok either).