C++ char array null terminator location

John Mahoney picture John Mahoney · Apr 7, 2012 · Viewed 45.4k times · Source

I am a student learning C++, and I am trying to understand how null-terminated character arrays work. Suppose I define a char array like so:

char* str1 = "hello world";

As expected, strlen(str1) is equal to 11, and it is null-terminated.

Where does C++ put the null terminator, if all 11 elements of the above char array are filled with the characters "hello world"? Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'? CPlusPlus.com seems to suggest that one of the 11 would need to be '\0', unless it is indeed allocating 12.

Suppose I do the following:

// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );

// Copy the first one to the second one
strncpy( str2, str1, strlen(str1) );

// Output the second one
cout << "Str2: " << str2 << endl;

This outputs Str2: hello worldatcomY╗°g♠↕, which I assume is C++ reading the memory at the location pointed to by the pointer char* str2 until it encounters what it interprets to be a null character.

However, if I then do this:

// Null-terminate the second one
str2[strlen(str1)] = '\0';

// Output the second one again
cout << "Terminated Str2: " << str2 << endl;

It outputs Terminated Str2: hello world as expected.

But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?

Running this code does not seem to cause any compiler warnings or run-time errors. Is this safe to do in practice? Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?

Answer

JaredPar picture JaredPar · Apr 7, 2012

In the case of a string literal the compiler is actually reserving an extra char element for the \0 element.

// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );

This is a common mistake new C programmers make. When allocating the storage for a char* you need to allocate the number of characters + 1 more to store the \0. Not allocating the extra storage here means this line is also illegal

// Null-terminate the second one
str2[strlen(str1)] = '\0';

Here you're actually writing past the end of the memory you allocated. When allocating X elements the last legal byte you can access is the memory address offset by X - 1. Writing to the X element causes undefined behavior. It will often work but is a ticking time bomb.

The proper way to write this is as follows

size_t size = strlen(str1) + sizeof(char);
char* str2 = (char*) malloc(size);
strncpy( str2, str1, size);

// Output the second one
cout << "Str2: " << str2 << endl;

In this example the str2[size - 1] = '\0' isn't actually needed. The strncpy function will fill all extra spaces with the null terminator. Here there are only size - 1 elements in str1 so the final element in the array is unneeded and will be filled with \0