What is the advantage of using memset() in C

embedded_guy picture embedded_guy · Dec 16, 2011 · Viewed 8.8k times · Source

I was curious as to whether or not there was any advantage in regards to efficiency to utilizing memset() in a situation similar to the one below.

Given the following buffer declarations...

struct More_Buffer_Info
{
    unsigned char a[10];
    unsigned char b[10];
    unsigned char c[10];
};

struct My_Buffer_Type
{
    struct More_Buffer_Info buffer_info[100];
};

struct My_Buffer_Type my_buffer[5];

unsigned char *p;
p = (unsigned char *)my_buffer;

Besides having less lines of code, is there an advantage to using this:

memset((void *)p, 0, sizeof(my_buffer));

Over this:

for (i = 0; i < sizeof(my_buffer); i++)
{
    *p++ = 0;
}

Answer

Mysticial picture Mysticial · Dec 16, 2011

This applies to both memset() and memcpy():

  1. Less Code: As you have already mentioned, it's shorter - fewer lines of code.
  2. More Readable: Shorter usually makes it more readable as well. (memset() is more readable than that loop)
  3. It can be faster: It can sometimes allow more aggressive compiler optimizations. (so it may be faster)
  4. Misalignment: In some cases, when you're dealing with misaligned data on a processor that doesn't support misaligned accesses, memset() and memcpy() may be the only clean solution.

To expand on the 3rd point, memset() can be heavily optimized by the compiler using SIMD and such. If you write a loop instead, the compiler will first need to "figure out" what it does before it can attempt to optimize it.

The basic idea here is that memset() and similar library functions, in some sense, "tells" the compiler your intent.


As mentioned by @Oli in the comments, there are some downsides. I'll expand on them here:

  1. You need to make sure that memset() actually does what you want. The standard doesn't say that zeros for the various datatypes are necessarily zero in memory.
  2. For non-zero data, memset() is restricted to only 1 byte content. So you can't use memset() if you want to set an array of ints to something other than zero (or 0x01010101 or something...).
  3. Although rare, there are some corner cases, where it's actually possible to beat the compiler in performance with your own loop.*

*I'll give one example of this from my experience:

Although memset() and memcpy() are usually compiler intrinsics with special handling by the compiler, they are still generic functions. They say nothing about the datatype including the alignment of the data.

So in a few (abeit rare) cases, the compiler isn't able to determine the alignment of the memory region, and thus must produce extra code to handle misalignment. Whereas, if you the programmer, is 100% sure of alignment, using a loop might actually be faster.

A common example is when using SSE/AVX intrinsics. (such as copying a 16/32-byte aligned array of floats) If the compiler can't determine the 16/32-byte alignment, it will need to use misaligned load/stores and/or handling code. If you simply write a loop using SSE/AVX aligned load/store intrinsics, you can probably do better.

float *ptrA = ...  //  some unknown source, guaranteed to be 32-byte aligned
float *ptrB = ...  //  some unknown source, guaranteed to be 32-byte aligned
int length = ...   //  some unknown source, guaranteed to be multiple of 8

//  memcopy() - Compiler can't read comments. It doesn't know the data is 32-byte
//  aligned. So it may generate unnecessary misalignment handling code.
memcpy(ptrA, ptrB, length * sizeof(float));

//  This loop could potentially be faster because it "uses" the fact that
//  the pointers are aligned. The compiler can also further optimize this.
for (int c = 0; c < length; c += 8){
    _mm256_store_ps(ptrA + c, _mm256_load_ps(ptrB + c));
}