Easiest/Best Way to Learn the x86 Instruction Set?

mudge picture mudge · Mar 18, 2010 · Viewed 7k times · Source

I would like to learn the x86 Instruction Set Architecture. I don't meaning learning an assembly for x86. I want to understand the machine code baby.

The reason is that I would like to write an assembler for x86. Then I want to write a compiler that compiles to that assembly.

I know that there are the Intel manuals and AMD manuals that cover the x86 instruction set. But those are very large and dense.

I'm wondering if there is a more approachable (possibly tutorial) approach to learning the x86 instruction set architecture.

Answer

claws picture claws · May 11, 2010

Well, I don't agree with you. Complexity of x86 is misunderstood and thus exaggerated. I'm not saying that it isn't complex. It surely is but thats the case only if want to write a full fledged Compiler or Assembler. If you just want to learn Assembly. It isn't that complex.

Lets break down x86-64 architecture to prove my point.


Registers:

x86-64 specifies few registers. How many exactly? Lets enumerate them

  • 16 General purpose registers (RAX, RBX, RCX, RDX,RSI,RDI, RBP, RSP + R8, R9, R10, R11, R12, R13, R14, R15)
  • 6 Segement registers (CS, DS, SS, ES, FS, GS)
  • 64-bit RFlags & 64-bit RIP
  • 8 80-bit Floating point (x87) registers (FPR0-FPR7) aliased to 64-bit MMX registers (MM0-MM7)
  • 16 128-bit extended media registers (XMM0-XMM7 + XMM8-XMM16)
  • some special/miscellaneous registers such as control registers (CR0 through 4), debug registers (DR0 through 3, plus 6 and 7), test registers (TR4 through 7), descriptor registers (GDTR, LDTR, IDTR), and a task register (TR) which we hardly need to care.

alt text http://www.viva64.com/content/articles/64-bit-development/amd64_em64t/01-big.png


Addressing Modes:

How to reference any memory location?

Source: http://en.wikipedia.org/wiki/X86#Addressing_modes

Addressing modes for 32-bit address size on 32-bit or 64-bit x86 processors can be summarized by this formula:

alt text

Addressing modes for 64-bit code on 64-bit x86 processors can be summarized by these formulas:

alt text

and

RIP + [displacement]


Operation Modes:

These are the modes in which it can operate:

  1. Real mode
  2. Protected mode
    • Virtual 8086 mode
  3. Long mode

Instruction Set:

You hear people saying its a large instruction set. Well, there are around 500-600 instructions. But some of them are same instructions with very little variations like CMPS/CMPSB/CMPSW/CMPSD/CMPSQ. If you group them like this number comes down to 400 instructions.

Do you feel its very large? Then I have few questions. How many functions does a C Standard library has? how many functions does POSIX library has? What about .NET & Java? How many classes & methods do they have? Do we have to know all of the functions/methods/classes? What approach do we take for learning these libraries?

Just learn few from each. Roughly go through all of them. Get the feel of their existence and use the reference when you need.

We can logically divide these instructions into following categories:

  1. General-Purpose Instructions
    • Basic Data Manipulation (moving & copying)
    • Control Transfer (Jumps, Calls, Interrupts)
    • Arithmetic & Logic Instructions (add,sub,and,xor etc..)
    • String & Bit Oriented Instructions
    • System Calls
  2. System Instructions
  3. x87 Floating-Point Instructions
  4. 64-Bit Media (MMX) Instructions
  5. 128-Bit Media (SSE) Instructions

Thats it!! Thats all you need to know. Now frankly tell me. Is it that complex?

Just get any good book on assembly language covering x86 architecture. I would personally suggest "Assembly Language Programming in GNU/Linux for IA32 Architectures" By Rajat Moona because its short & to the point. Doesn't waste much of your time. But it doesn't cover X86-64.

After familiarized with IA32 for x86-64 read http://csapp.cs.cmu.edu/public/1e/public/docs/asm64-handout.pdf