andi vs. addi instruction in MIPS with negative immediate constant

Lin Yu Cheng picture Lin Yu Cheng · Oct 17, 2016 · Viewed 10.1k times · Source

Assume $t2=0x55555550, then executing the following instruction:

andi $t2, $t2, -1

$t2 becomes 0x0005550

This is confirmed by the MIPS emulator1

However, it is not what I expected. I think the answer should be 0x55555550 & 0xFFFFFFFF = 0x55555550. I think the constant -1 was sign extended to 0xFFFFFFFF before the and logic. But it appears that the answer was 0x55555550 & 0x0000FFFF

Why -1 is sign extended to 0x0000FFFF instead of 0xFFFFFFFF


Footnote 1: Editor's note: MARS with "extended pseudo-instructions" enabled does expand this to multiple instructions to generate 0xffffffff in a tmp register, thus leaving $t2 unchanged. Otherwise MARS and SPIM both reject it with an error as not encodeable. Other assemblers may differ.

Answer

Craig Estey picture Craig Estey · Oct 18, 2016

Your expectation is correct, but your interpretation of your experimental results is not

$t2 becomes 0x0005550 This is confirmed by the MIPS emulator.

No, this is incorrect. So, one of the following:

  1. Somehow, you're misreading what the emulator is doing. The actual value from the emulator is what you expected it to be.
  2. Or, you don't have 0x55555550 in $t2 before the andi as you assume, but 0x5550 instead (i.e.) your test program doesn't set up $t2 correctly.

However, it is not what I expected. I think the answer should be 0x55555550 & 0xFFFFFFFF = 0x55555550. I think the constant -1 was sign extended to 0xFFFFFFFF before the and logic.

Yes, this is correct. And, I'll explain what is happening and why below.

But it appears that the answer was 0x55555550 & 0x0000FFFF. Why -1 is sign extended to 0x0000FFFF instead of 0xFFFFFFFF

It wasn't. It was sign extended to 0xFFFFFFFF. Again, you're reading the experimental results incorrectly [or your test program has a bug].


mips simulators and assemblers have pseudo ops.

These are instructions that may or may not exist as real, physical instructions. However, they are interpreted by the assembler to generate a sequence of physical/real instructions.

An example of a "pure" pseudo-op is li ("load immediate"). It has no corresponding instruction, but usually generates a two instruction sequence: lui, ori (which are physical instructions).

Pseudo-ops should not be confused with assembler directives, such as .text, .data, .word, .eqv, etc.

Some pseudo-ops can overlap with actual physical instructions. That is what is happening with your example.

In fact, the assembler examines any given instruction as a potential pseudo-op. It may determine that in can fulfill the intent with a single physical instruction. If not, it will generate a 1-3 instruction sequence and may use the [reserved] $at register [which is $1] as part of that sequence.

In mars, to see the actual real instructions, look in the Basic column of the source window.

For the sake of the completeness of my answer, all that follows is prefaced by the top comments.

I've created three example programs:

  1. The addi as in your original post
  2. The andi as in your corrected post
  3. An andi that uses an unsigned argument

(1) Here is the assembler source for your original question using addi:

    .text
    .globl  main
main:
    li      $t2,0x55555550
    addi    $t3,$t2,-1
    nop

Here is how mars interpreted it:

 Address    Code        Basic                     Source

0x00400000  0x3c015555  lui $1,0x00005555         4     li      $t2,0x55555550
0x00400004  0x342a5550  ori $10,$1,0x00005550
0x00400008  0x214bffff  addi $11,$10,0xffffffff   5     addi    $t3,$t2,-1
0x0040000c  0x00000000  nop                       6     nop

addi will sign extend its 16 bit immediate, so we have 0xFFFFFFFF. Then, doing a two's complement add operation, we have a final result of 0x5555554F

Thus, the assembler didn't need to generate extra instructions for the addi, so the addi pseudo-op generated a single real addi


(2) Here is the andi source:

    .text
    .globl  main
main:
    li      $t2,0x55555550
    andi    $t3,$t2,-1
    nop

Here is the assembly:

 Address    Code        Basic                     Source

0x00400000  0x3c015555  lui $1,0x00005555         4     li      $t2,0x55555550
0x00400004  0x342a5550  ori $10,$1,0x00005550
0x00400008  0x3c01ffff  lui $1,0xffffffff         5     andi    $t3,$t2,-1
0x0040000c  0x3421ffff  ori $1,$1,0x0000ffff
0x00400010  0x01415824  and $11,$10,$1
0x00400014  0x00000000  nop                       6     nop

Whoa! What happened? The andi generated three instructions.

A real andi instruction does not sign extend its immediate argument. So, the largest unsigned value we can use in a real andi is 0xFFFF

But, by specifying -1, we told the assembler that we did want sign extension (i.e. 0xFFFFFFFF)

So, the assembler could not fulfull the intent with a single instruction and we get the sequence above. And the generated sequence could not use andi but had to use the register form: and. Here is the andi generated code converted back into more friendly asm source:

    lui     $at,0xFFFF
    ori     $at,$at,0xFFFF
    and     $t3,$t2,$at

As to result, we're anding 0x55555550 and 0xFFFFFFFF which is a [still unchanged] value of 0x55555550


(3) Here is the source for an unsigned version of andi:

    .text
    .globl  main
main:
    li      $t2,0x55555550
    andi    $t3,$t2,0xFFFF
    nop

Here is the assembler output:

 Address    Code        Basic                     Source

0x00400000  0x3c015555  lui $1,0x00005555         4     li      $t2,0x55555550
0x00400004  0x342a5550  ori $10,$1,0x00005550
0x00400008  0x314bffff  andi $11,$10,0x0000ffff   5     andi    $t3,$t2,0xFFFF
0x0040000c  0x00000000  nop                       6     nop

When the assembler sees that we're using a hex constant (i.e. the 0x prefix), it tries to fulfill the value as an unsigned operation. So, it doesn't need to sign extend. And, the real andi can fulfill the request.

The result of this is 0x5550

Note that if we had used a mask value of 0x1FFFF, that would be unsigned. But, it's larger than 16 bits, so the assembler would generate a multi-instruction sequence to fulfill the request.

And, the result here would be 0x15550