Why do (only) some compilers use the same address for identical string literals?

Eugene Kosov picture Eugene Kosov · Oct 15, 2018 · Viewed 7.1k times · Source

https://godbolt.org/z/cyBiWY

I can see two 'some' literals in assembler code generated by MSVC, but only one with clang and gcc. This leads to totally different results of code execution.

static const char *A = "some";
static const char *B = "some";

void f() {
    if (A == B) {
        throw "Hello, string merging!";
    }
}

Can anyone explain the difference and similarities between those compilation outputs? Why does clang/gcc optimize something even when no optimizations are requested? Is this some kind of undefined behaviour?

I also notice that if I change the declarations to those shown below, clang/gcc/msvc do not leave any "some" in the assembler code at all. Why is the behaviour different?

static const char A[] = "some";
static const char B[] = "some";

Answer

songyuanyao picture songyuanyao · Oct 15, 2018

This is not undefined behavior, but unspecified behavior. For string literals,

The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.

That means the result of A == B might be true or false, on which you shouldn't depend.

From the standard, [lex.string]/16:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.