How to read / write .exe machine code manually?

Peter Perháč picture Peter Perháč · Apr 16, 2009 · Viewed 13.7k times · Source

I am not well acquainted to the compiler magic. The act of transforming human-readable code (or the not really readable Assembly instructions) into machine code is, for me, rocket science combined with sorcery.

I will narrow down the subject of this question to Win32 executables (.exe). When I open these files up in a specialized viewer, I can find strings (usually 16b per character) scattered at various places, but the rest is just garbage. I suppose the unreadable part (majority) is the machine code (or maybe resources, such as images etc...).

Is there any straightforward way of reading the machine code? Opening the exe as a file stream and reading it byte by byte, how could one turn these individual bytes into Assembly? Is there a straightforward mapping between these instruction bytes and the Assembly instruction?

How is the .exe written? Four bytes per instruction? More? Less? I have noticed some applications can create executable files just like that: for example, in ACD See you can export a series of images into a slideshow. But this does not necessarily have to be a SWF slideshow, ACD See is also capable of producing EXEcutable presentations. How is that done?

How can I understand what goes on inside an EXE file?

Answer

dreamlax picture dreamlax · Apr 16, 2009

OllyDbg is an awesome tool that disassembles an EXE into readable instructions and allows you to execute the instructions one-by-one. It also tells you what API functions the program uses and if possible, the arguments that it provides (as long as the arguments are found on the stack).

Generally speaking, CPU instructions are of variable length, some are one byte, others are two, some three, some four etc. It mostly depends on the kind of data that the instruction expects. Some instructions are generalised, like "mov" which tells the CPU to move data from a CPU register to a place in memory, or vice versa. In reality, there are many different "mov" instructions, ones for handling 8-bit, 16-bit, 32-bit data, ones for moving data from different registers and so on.

You could pick up Dr. Paul Carter's PC Assembly Language Tutorial which is a free entry level book that talks about assembly and how the Intel 386 CPU operates. Most of it is applicable even to modern day consumer Intel CPUs.

The EXE format is specific to Windows. The entry-point (i.e. the first executable instruction) is usually found at the same place within the EXE file. It's all kind of difficult to explain all at once, but the resources I've provided should help cure at least some of your curiosity! :)