What does multicore assembly language look like?

Paul Hollingsworth picture Paul Hollingsworth · Jun 11, 2009 · Viewed 49.6k times · Source

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc.

With modern CPUs that have 4 cores (or even more), at the machine code level does it just look like there are 4 separate CPUs (i.e. are there just 4 distinct "EDX" registers) ? If so, when you say "increment the EDX register", what determines which CPU's EDX register is incremented? Is there a "CPU context" or "thread" concept in x86 assembler now?

How does communication/synchronization between the cores work?

If you were writing an operating system, what mechanism is exposed via hardware to allow you to schedule execution on different cores? Is it some special priviledged instruction(s)?

If you were writing an optimizing compiler/bytecode VM for a multicore CPU, what would you need to know specifically about, say, x86 to make it generate code that runs efficiently across all the cores?

What changes have been made to x86 machine code to support multi-core functionality?

Answer

Nathan Fellman picture Nathan Fellman · Jun 13, 2009

This isn't a direct answer to the question, but it's an answer to a question that appears in the comments. Essentially, the question is what support the hardware gives to multi-threaded operation.

Nicholas Flynt had it right, at least regarding x86. In a multi threaded environment (Hyper-threading, multi-core or multi-processor), the Bootstrap thread (usually thread 0 in core 0 in processor 0) starts up fetching code from address 0xfffffff0. All the other threads start up in a special sleep state called Wait-for-SIPI. As part of its initialization, the primary thread sends a special inter-processor-interrupt (IPI) over the APIC called a SIPI (Startup IPI) to each thread that is in WFS. The SIPI contains the address from which that thread should start fetching code.

This mechanism allows each thread to execute code from a different address. All that's needed is software support for each thread to set up its own tables and messaging queues. The OS uses those to do the actual multi-threaded scheduling.

As far as the actual assembly is concerned, as Nicholas wrote, there's no difference between the assemblies for a single threaded or multi threaded application. Each logical thread has its own register set, so writing:

mov edx, 0

will only update EDX for the currently running thread. There's no way to modify EDX on another processor using a single assembly instruction. You need some sort of system call to ask the OS to tell another thread to run code that will update its own EDX.