I'm confused where to use cmov
instructions and where to use jump
instructions in assembly?
From performance point of view:
If possible, please explain their difference with an example.
movcc is a so-called predicated instruction. That's fancy-speak for "this instruction executes under a condition (predicate)".
Many processors, including the x86, after doing an arithmetic operation (especially compare instructions), sets the condition code bits to indicate the status of the result of the operation.
A conditional jump instruction checks the condition code bits for a status, and if true, jumps to a designated target.
Because the jump is conditional, and the processor typically has a deep pipeline, the condition code bits may literally not ready for the jmp instruction to process when the CPU encounters the jmp instruction. The chip designers could simply wait for the pipeline to drain (often many clock cycles), and then execute the jmp, but that would make the processor slow.
Instead, most of them choose to have a branch prediction algorithm, which predicts which way a conditional jump will go. The processor can then fetch, decode, and execute the predicted branch (or not), and continue fast execution, with the proviso that if the condition code bits that finally arrive turn out to be wrong for conditional (branch mispredict), the processor undoes all work it did after the branch, and re-executes the program going down the other path.
Conditional jumps are harder for pipelined execution than normal data dependencies, because they can change which instruction should be next in the stream of instructions flowing through the pipeline. This is called a control dependency, as opposed to a data dependency (like an add
where both inputs are outputs of other recent instructions).
The branch predictors turn out to be very good, because most branches tend to have bias about their direction. (The branch at the end of most loops, is going to branch back to top, typically). So most of the time the processor doesn't have to back out of wrongly predicted work.
If the direction of the branch is highly unpredictable, then the processor will guess wrong about 50% of the time, thus have to back out work. That's expensive.
OK, now, one often finds code like this:
cmp ...
jcc $
mov register1, register2
$: ; continue here
...
; use register1
If the branch predictor guesses right, this code is fast, no matter which way the branch goes. If it guesses wrong a lot... ouch.
Thus the conditional move instruction. This is a move that conditionally moves data, based on the condition code bits. We can rewrite the above:
cmp ...
movcc register1, register2
$: ; continue here
...
; use register1
Now we have no branch instructions, and thus no mispredicts that make the processor undo all the work. Since there is no control dependency, the following instructions need to be fetched and decoded regardless of whether the movcc
acts like a mov
or nop
. The pipeline can stay full without predicting the condition and speculatively executing instructions that use register1
. (You could build a CPU that way, but it would defeat the purpose of movcc
.)
movcc
converts a control dependency into a data dependency. The CPU treats it exactly like a 3-input math instruction, with the inputs being EFLAGS and its two "regular" inputs (dest register and source register-or-memory). On x86, adc
is identical to cmovae
(mov if CF==0
) as far as how out-of-order execution tracks the dependencies: inputs are CF, and both GP registers. Output is the destination register.
For the x86, there are cmovcc
, jcc
, and setcc
instructions for every condition combination cc. (setcc
sets the destination to 0 or 1, according to the condition. So it has a data dependency on the flags, and no other input dependencies.)