I conducted the following benchmark in qemu and qemu-kvm, with the following configuration:
CPU: AMD 4400 process dual core with svm enabled, 2G RAM
Host OS: OpenSUSE 11.3 with latest Patch, running with kde4
Guest OS: FreeDos
Emulated Memory: 256M
Network: Nil
Language: Turbo C 2.0
Benchmark Program: Count from 0000000 to 9999999. Display the counter on the screen
by direct accessing the screen memory (i.e. 0xb800:xxxx)
It only takes 6 sec when running in qemu.
But it takes 89 sec when running in qemu-kvm.
I ran the benchmark one by one, not in parallel.
I scratched my head the whole night, but still not idea why this happens. Would somebody give me some hints?
KVM uses qemu as his device simulator, any device operation is simulated by user space QEMU program. When you write to 0xB8000, the graphic display is operated which involves guest's doing a CPU `vmexit' from guest mode and returning to KVM module, who in turn sends device simulation requests to user space QEMU backend.
In contrast, QEMU w/o KVM does all the jobs in unified process except for usual system calls, there's fewer CPU context switches. Meanwhile, your benchmark code is a simple loop which only requires code block translation
for just one time. That cost nothing, compared to vmexit
and kernel-user communication of every iteration in KVM case.
This should be the most probable cause.