What is the difference between native code, machine code and assembly code?

samaladeepak picture samaladeepak · Aug 8, 2010 · Viewed 49.5k times · Source

I'm confused about machine code and native code in the context of .NET languages.

What is the difference between them? Are they the same?

Answer

Timwi picture Timwi · Aug 8, 2010

The terms are indeed a bit confusing, because they are sometimes used inconsistently.

Machine code: This is the most well-defined one. It is code that uses the byte-code instructions which your processor (the physical piece of metal that does the actual work) understands and executes directly. All other code must be translated or transformed into machine code before your machine can execute it.

Native code: This term is sometimes used in places where machine code (see above) is meant. However, it is also sometimes used to mean unmanaged code (see below).

Unmanaged code and managed code: Unmanaged code refers to code written in a programming language such as C or C++, which is compiled directly into machine code. It contrasts with managed code, which is written in C#, VB.NET, Java, or similar, and executed in a virtual environment (such as .NET or the JavaVM) which kind of “simulates” a processor in software. The main difference is that managed code “manages” the resources (mostly the memory allocation) for you by employing garbage collection and by keeping references to objects opaque. Unmanaged code is the kind of code that requires you to manually allocate and de-allocate memory, sometimes causing memory leaks (when you forget to de-allocate) and sometimes segmentation faults (when you de-allocate too soon). Unmanaged also usually implies there are no run-time checks for common errors such as null-pointer dereferencing or array bounds overflow.

Strictly speaking, most dynamically-typed languages — such as Perl, Python, PHP and Ruby — are also managed code. However, they are not commonly described as such, which shows that managed code is actually somewhat of a marketing term for the really big, serious, commercial programming environments (.NET and Java).

Assembly code: This term generally refers to the kind of source code people write when they really want to write byte-code. An assembler is a program that turns this source code into real byte-code. It is not a compiler because the transformation is 1-to-1. However, the term is ambiguous as to what kind of byte-code is used: it could be managed or unmanaged. If it is unmanaged, the resulting byte-code is machine code. If it is managed, it results in the byte-code used behind-the-scenes by a virtual environment such as .NET. Managed code (e.g. C#, Java) is compiled into this special byte-code language, which in the case of .NET is called Common Intermediate Language (CIL) and in Java is called Java byte-code. There is usually little need for the common programmer to access this code or to write in this language directly, but when people do, they often refer to it as assembly code because they use an assembler to turn it into byte-code.