Consider this code:
int i = 1;
int x = ++i + ++i;
We have some guesses for what a compiler might do for this code, assuming it compiles.
++i
return 2
, resulting in x=4
.++i
returns 2
and the other returns 3
, resulting in x=5
.++i
return 3
, resulting in x=6
.To me, the second seems most likely. One of the two ++
operators is executed with i = 1
, the i
is incremented, and the result 2
is returned. Then the second ++
operator is executed with i = 2
, the i
is incremented, and the result 3
is returned. Then 2
and 3
are added together to give 5
.
However, I ran this code in Visual Studio, and the result was 6
. I'm trying to understand compilers better, and I'm wondering what could possibly lead to a result of 6
. My only guess is that the code could be executed with some "built-in" concurrency. The two ++
operators were called, each incremented i
before the other returned, and then they both returned 3
. This would contradict my understanding of the call stack, and would need to be explained away.
What (reasonable) things could a C++
compiler do that would lead to a result of 4
or a result or 6
?
This example appeared as an example of undefined behavior in Bjarne Stroustrup's Programming: Principles and Practice using C++ (C++ 14).
See cinnamon's comment.
The compiler takes your code, splits it into very simple instructions, and then recombines and arranges them in a way that it thinks optimal.
The code
int i = 1;
int x = ++i + ++i;
consists of the following instructions:
1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
5. read i as tmp2
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x
But despite this being a numbered list the way I wrote it, there are only a few ordering dependencies here: 1->2->3->4->5->10->11 and 1->6->7->8->9->10->11 must stay in their relative order. Other than that the compiler can freely reorder, and perhaps eliminate redundancy.
For example, you could order the list like this:
1. store 1 in i
2. read i as tmp1
6. read i as tmp3
3. add 1 to tmp1
7. add 1 to tmp3
4. store tmp1 in i
8. store tmp3 in i
5. read i as tmp2
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x
Why can the compiler do this? Because there's no sequencing to the side effects of the increment. But now the compiler can simplify: for example, there's a dead store in 4: the value is immediately overwritten. Also, tmp2 and tmp4 are really the same thing.
1. store 1 in i
2. read i as tmp1
6. read i as tmp3
3. add 1 to tmp1
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x
And now everything to do with tmp1 is dead code: it's never used. And the re-read of i can be eliminated too:
1. store 1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
10. add tmp3 and tmp3, as tmp5
11. store tmp5 in x
Look, this code is much shorter. The optimizer is happy. The programmer is not, because i was only incremented once. Oops.
Let's look at something else the compiler can do instead: let's go back to the original version.
1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
5. read i as tmp2
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x
The compiler could reorder it like this:
1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x
and then notice again that i is read twice, so eliminate one of them:
1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x
That's nice, but it can go further: it can reuse tmp1:
1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp1
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x
Then it can eliminate the re-read of i in 6:
1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x
Now 4 is a dead store:
1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x
and now 3 and 7 can be merged into one instruction:
1. store 1 in i
2. read i as tmp1
3+7. add 2 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x
Eliminate the last temporary:
1. store 1 in i
2. read i as tmp1
3+7. add 2 to tmp1
8. store tmp1 in i
10. add tmp1 and tmp1, as tmp5
11. store tmp5 in x
And now you get the result that Visual C++ is giving you.
Note that in both optimization paths, the important order dependencies were preserved, insofar as the instructions weren't removed for doing nothing.