The relation between privileged instructions, traps and system calls

datwelk picture datwelk · May 18, 2013 · Viewed 13.5k times · Source

I am trying to understand how a virtual machine monitor (VMM) virtualizes the CPU.

My understanding right now is that the CPU issues a protection fault interrupt when a privileged instruction is about to be executed while the CPU is in user mode. In high level languages like C, privileged instructions are wrapped inside system calls. For example, when an application needs the current date and time (instructions that interact with I/O devices are privileged), it calls a certain library function. The assembled version of this library function contains an instruction called 'int' that causes a trap in the CPU. The CPU switches from user mode to privileged mode and jumps to the trap handler the OS has provided. Each system call has its own trap handler. In this example, the trap handler reads the date and time from the hardware clock and returns, then the CPU switches itself from privileged to user mode. (source: http://elvis.rowan.edu/~hartley/Courses/OperatingSystems/Handouts/030Syscalls.html)

However, I am not quite sure this understanding is correct. This article mentions the notion that the (privileged) x86 popf instruction does not cause a trap, and thus complicates things for the VMM: http://www.csd.uwo.ca/courses/CS843a/papers/intro-vm.pdf. In my understanding the popf instruction should not cause a trap but a protection fault interrupt, when explicitly called by a user program and not through a system call.

So my two concrete questions are:

  • What happens when a user program executes a privileged instruction while the CPU is in user mode?
  • What happens when a user program performs a system call?

Answer

Wandering Logic picture Wandering Logic · Jun 13, 2013

In no particular order:

Your confusion is mainly caused by the fact that the operating systems community does not have standardized vocabulary. Here are some terms that get slung around that sometimes mean the same thing, sometimes not: exception, fault, interrupt, system call, and trap. Any individual author will generally use the terms consistently, but different authors define them differently.

There are 3 different kinds of events that cause entry into privileged mode.

  1. An asynchronous interrupt (caused, for example, by an i/o device needing service.)
  2. A system call instruction (int on the x86). (More generally in the x86 manuals these are called traps and include a couple of other instructions (for debuggers mostly.))
  3. An instruction that does something exceptional (illegal instruction, protection fault, divide-by-0, page fault, ...). (Different authors calls these exceptions, faults or traps. x86 manuals call these faults.)

Each interrupt, trap or fault has a different number associated with it.

In all cases:

  1. The processor enters privileged mode.
  2. The user-mode registers are saved somewhere.
  3. The processor finds the base address of the interrupt vector table, and uses the interrupt/trap/fault number as an offset into the table. This gives a pointer to the service routine for that interrupt/trap/fault.
  4. The processor jumps to the service routine. Now we are in protected mode, the user level state is all saved somewhere we can get at it, and we're in the correct code inside the operating system.
  5. When the service routine is finished it calls an interrupt-return instruction (iret on x86.) (This is the subtle distinction between a fault and a trap on x86: faults return to the instruction that caused the fault, traps return to the instruction after the trap.)

Note the confusing name "interrupt vector table." Even though it is called an interrupt table, it is used for faults and traps as well. (Which leads some authors to call everything an interrupt.)

The popf issue is rather subtle. This is essentially a bug in the x86 architecture. When popf executes from user mode it does not cause a trap or fault (or exception or interrupt or whatever you want to call it.) It simply acts as a noop.

Does this matter? Well, for a normal OS it doesn't really matter. If, on the other hand, you are implementing a virtual machine monitor (like VMWare or Xen or Hyper-V), the VMM is running in protected mode, and you'd like to run the guest operating systems in user mode and efficiently emulate any protected mode code. When the guest operating system uses a popf instruction you want it to generate a general protection fault, but it doesn't. (The cli and sti instructions do generate a general protection fault if called from user mode, which is what you want.)