Copying array to a structure

user1129665 picture user1129665 · Jul 24, 2012 · Viewed 13.9k times · Source

I have an array of 9 bytes and I want to copy these bytes to a structure :

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

typedef struct _structure {
    char one[5];        /* 5 bytes */
    unsigned int two;   /* 4 bytes */
} structure;

int main(int argc, char **argv) {

    structure my_structure;

    char array[]    = {
        0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
        0x00, 0xbc, 0x61, 0x4e          /* 12345678 (base 10) */
    };

    memcpy(&my_structure, array, sizeof(my_structure));

    printf("%s\n", my_structure.one);   /* OK, "ABCD" */
    printf("%d\n", my_structure.two);   /* it prints 1128415566 */

    return(0);
}

The first element of the structure my_structure, one, is copied correctly; however, my_structure.two contains 1128415566 while I expect 12345678. array and my_structure have different sizes and even if they are equal in size, still there will be a problem with two . How can I fix this issue?

Answer

2.718 picture 2.718 · Jul 24, 2012

There are a few problems:

For efficiency reasons, compilers align variables on boundaries equal to the register size of of the processor. I.e. on 32-bit systems this would be on 32-bit (4 byte) boundaries. Additionally, structures will have "gaps" so that the struct members can be aligned on 32-bit boundaries. In other words: the struct is not "packed" tightly. Try this:

#include <stdio.h>

typedef struct
{
    char one[5];        /* 5 bytes */
    unsigned int two;   /* 4 bytes */
}
    structure;
structure my_structure;

char array[] = 
{
    0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
    0x00, 0xbc, 0x61, 0x4e          /* 12345678 (base 10) */
};

int main(int argc, char **argv) 
{
    const int sizeStruct = sizeof(structure);
    printf("sizeof(structure) = %d bytes\n", sizeStruct);
    const int sizeArray = sizeof(array);
    printf("sizeof(array) = %d bytes\n", sizeArray);
    return 0;
}

You should see different sizes.

You can override this behavior by using #pragma or attribute directives. With gcc you can change the structure definition using attributes. E.g. change above code to add a "packed" attribute (requires gcc):

typedef struct __attribute__((packed))

Then run the program again. Sizes should be the same now. Note: On some processor architectures, e.g. ARMv4, 32-bit variables must be aligned on a 32-bit boudary or your program will not run (get an exception). Read compiler documentation of "aligned" and "packed" pragmas or attributes.

The next problem is byte order. Try this:

printf("0x%08X\n", 12345678);

12345678 in hex is 0x00BC614E. From your example and the output you are getting, I can tel that you platform is "little endian". In "little endian" systems, the number 0x00BC614E is stored as a byte sequence starting with the least significant byte, e.g. 0x4E, 0x61, 0xBC, 0x00. So change your array definition:

char array[] = 
{
    0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
    0x4E, 0x61, 0xBC, 0x00,         /* 12345678 (base 10) */
};

Now your program will print 12345678.

Also note that you should use %u to print an unsigned int.

Copying char strings is potentially a can of worms, especially if you have to allow for different encodings (e.g. Unicode). At the very least, you need ensure that your copy destination buffer is protected from overruns.

Revised code:

#include <stdio.h>
#include <string.h>

typedef struct
{
    char one[5];        /* 5 bytes */
    unsigned int two;   /* 4 bytes */
}
    structure;

structure my_structure;

char array[] = 
{
    0x41, 0x42, 0x43, 0x44, 0x00,   /* ABCD\0 */
    0x4E, 0x61, 0xBC, 0x00,         /* 12345678 (base 10) */
};

int main() 
{
    // copy string as a byte array
    memcpy(&my_structure.one, &array[0], sizeof(my_structure.one));

    // copy uint
    my_structure.two = *((unsigned int *)(&array[5]));

    printf("%s\n", my_structure.one);
    printf("%u\n", my_structure.two);

    return 0;
}

Finally, it is usually a bad idea to rely on packed data structures because it makes porting code to a different platform difficult. However, sometimes you need to pack/unpack protocol packets. In those special cases it is usually best and most portable to manually pack / unpack each item using a pair of functions for each data type.

I will leave endian-ness issues for another topic. :-)