I was curious as to whether or not there was any advantage in regards to efficiency to utilizing memset() in a situation similar to the one below.
Given the following buffer declarations...
struct More_Buffer_Info
{
unsigned char a[10];
unsigned char b[10];
unsigned char c[10];
};
struct My_Buffer_Type
{
struct More_Buffer_Info buffer_info[100];
};
struct My_Buffer_Type my_buffer[5];
unsigned char *p;
p = (unsigned char *)my_buffer;
Besides having less lines of code, is there an advantage to using this:
memset((void *)p, 0, sizeof(my_buffer));
Over this:
for (i = 0; i < sizeof(my_buffer); i++)
{
*p++ = 0;
}
This applies to both memset()
and memcpy()
:
memset()
is more readable than that loop)memset()
and memcpy()
may be the only clean solution.To expand on the 3rd point, memset()
can be heavily optimized by the compiler using SIMD and such. If you write a loop instead, the compiler will first need to "figure out" what it does before it can attempt to optimize it.
The basic idea here is that memset()
and similar library functions, in some sense, "tells" the compiler your intent.
As mentioned by @Oli in the comments, there are some downsides. I'll expand on them here:
memset()
actually does what you want. The standard doesn't say that zeros for the various datatypes are necessarily zero in memory.memset()
is restricted to only 1 byte content. So you can't use memset()
if you want to set an array of int
s to something other than zero (or 0x01010101
or something...).*I'll give one example of this from my experience:
Although memset()
and memcpy()
are usually compiler intrinsics with special handling by the compiler, they are still generic functions. They say nothing about the datatype including the alignment of the data.
So in a few (abeit rare) cases, the compiler isn't able to determine the alignment of the memory region, and thus must produce extra code to handle misalignment. Whereas, if you the programmer, is 100% sure of alignment, using a loop might actually be faster.
A common example is when using SSE/AVX intrinsics. (such as copying a 16/32-byte aligned array of float
s) If the compiler can't determine the 16/32-byte alignment, it will need to use misaligned load/stores and/or handling code. If you simply write a loop using SSE/AVX aligned load/store intrinsics, you can probably do better.
float *ptrA = ... // some unknown source, guaranteed to be 32-byte aligned
float *ptrB = ... // some unknown source, guaranteed to be 32-byte aligned
int length = ... // some unknown source, guaranteed to be multiple of 8
// memcopy() - Compiler can't read comments. It doesn't know the data is 32-byte
// aligned. So it may generate unnecessary misalignment handling code.
memcpy(ptrA, ptrB, length * sizeof(float));
// This loop could potentially be faster because it "uses" the fact that
// the pointers are aligned. The compiler can also further optimize this.
for (int c = 0; c < length; c += 8){
_mm256_store_ps(ptrA + c, _mm256_load_ps(ptrB + c));
}