does kernel's panic() function completely freezes every other process?

user1284631 picture user1284631 · Nov 12, 2012 · Viewed 7.1k times · Source

I would like to be confirmed that kernel's panic() function and the others like kernel_halt() and machine_halt(), once triggered, guarantee complete freezing of the machine.

So, are all the kernel and user processes frozen? Is panic() interruptible by the scheduler? The interrupt handlers could still be executed?

Use case: in case of serious error, I need to be sure that the hardware watchdog resets the machine. To this end, I need to make sure that no other thread/process is keeping the watchdog alive. I need to trigger a complete halt of the system. Currently, inside my kernel module, I simply call panic() to freeze everything.

Also, the user-space halt command is guaranteed to freeze the system?

Thanks.

edit: According to: http://linux.die.net/man/2/reboot, I think the best way is to use reboot(LINUX_REBOOT_CMD_HALT): "Control is given to the ROM monitor, if there is one"

Answer

user1284631 picture user1284631 · Nov 16, 2012

Thank you for the comments above. After some research, I am ready to give myself a more complete answer, below:

At least for the x86 architecture, the reboot(LINUX_REBOOT_CMD_HALT) is the way to go. This, in turn, calls the syscall reboot() (see: http://lxr.linux.no/linux+v3.6.6/kernel/sys.c#L433). Then, for the LINUX_REBOOT_CMD_HALT flag (see: http://lxr.linux.no/linux+v3.6.6/kernel/sys.c#L480), the syscall calls kernel_halt() (defined here: http://lxr.linux.no/linux+v3.6.6/kernel/sys.c#L394). That function calls syscore_shutdown() to execute all the registered system core shutdown callbacks, displays the "System halted" message, then it dumps the kernel, AND, finally, it calls machine_halt(), that is a wrapper for native_machine_halt() (see: http://lxr.linux.no/linux+v3.6.6/arch/x86/kernel/reboot.c#L680). It is this function that stops the other CPUs (through machine_shutdown()), then calls stop_this_cpu() to disable the last remaining working processor. The first thing that this function does is to disable interrupts on the current processor, that is the scheduler is no more able to take control.

I am not sure why the syscall reboot() still calls do_exit(0), after calling kernel_halt(). I interpret it like that: now, with all processors marked as disabled, the syscall reboot() calls do_exit(0) and ends itself. Even if the scheduler is awoken, there are no more enabled processors on which it could schedule some task, nor interrupt: the system is halted. I am not sure about this explanation, as the stop_this_cpu() seems to not return (it enters an infinite loop). Maybe is just a safeguard, for the case when the stop_this_cpu() fails (and returns): in this case, do_exit() will end cleanly the current task, then the panic() function is called.

As for the panic() code (defined here: http://lxr.linux.no/linux+v3.6.6/kernel/panic.c#L69), the function first disables the local interrupts, then it disables all the other processors, except the current one by calling smp_send_stop(). Finally, as the sole task executing on the current processor (which is the only processor still alive), with all local interrupts disabled (that is, the preemptible scheduler -- a timer interrupt, after all -- has no chance...), then the panic() function loops some time or it calls emergency_restart(), that is supposed to restart the processor.

If you have better insight, please contribute.