What does actual machine code look like at various points?

Igorio picture Igorio · Apr 26, 2012 · Viewed 8.2k times · Source

There seems to be many opinions on what machine code actually is. I've heard some say it's assembly, or binary, or hex.

Is it correct to say that machine code is essentially a set of instructions for a particular processor? If so, I imagine these can be represented in binary or hexadecimal notation, or assembly. But what does the non-translated "actual" machine code look like? Is it based on the word size of the architecture? Or is hexadecimal for all intents and purposes the default representation?

What does it look like when sitting on a hard drive? What does it look like when sitting in a register? How about when it's being processed, is it simply a set of voltage changes at that point?

Answer

Kendall Frey picture Kendall Frey · Apr 26, 2012

Machine code is simply binary data that corresponds to CPU instructions for a specific processor architecture.

I won't go into how it is stored too much, because that depends on where it is stored. On disk, for example, it is generally stored as a sequence of magnetized regions. Machine code is no different from other binary data in the storage aspect. If your question is more about how data is stored on a computer, you should research the various data-storage devices in a computer, like HDD, RAM, and registers, to name a few.

The easiest way to visualize how machine code is stored is to look at some in a hex editor. This shows you the binary data represented by hex numbers. For example, take the instruction:

0xEB 0xFE

This could easily be written 1110101111111110, or 60414. It depends how you want to convert binary into human-readable form.

This instruction represents an infinite loop. (This is assuming it is being run on an x86 CPU. Other CPU's could interpret it however they want.) It can be coded in assembly like this:

j:
jmp j

When you run the assembler, it takes the above code and turns it into the binary machine code above.

The instruction is really two parts. The first is what is known as the opcode, and is the 0xEB. When this code goes into the CPU, it means: Read a byte from the program, and skip that many bytes of data. Then the CPU reads the byte 0xFE. Since it expects a signed integer, it interprets the binary data as the number -2. The instruction is then done being read, and the instruction pointer moves forward 2 bytes. The instruction is then executed, causing the instruction pointer to move forward -2 (0xFE) bytes, which effectively sets the instruction pointer to the same value as it had when the instruction was started.

I hope this answers your question. If you are wondering about the internal workings of CPU's, read up on microcode and electronic logic gates. Basically, it's a bunch of voltage differences, such as a 1 bit being a 5 volt charge and a 0 bit being a 0 bit charge.