how to avoid printk log dropping in linux kernel

Balamurugan A picture Balamurugan A · Oct 16, 2014 · Viewed 9.4k times · Source

Is there any tips or method to avoid kernel log drop or log buffer overrun ?

I have increased the log buffer size to maximum with below code change. I'm running in high end device only. But, still when when i want to get the complete log from my driver(which writes heavy logs), I see the printk logs are dropped sometimes. I use printk with KERN_INFO, gets enabled through dynamic debug(dprintk).

Change i do:

--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -55,7 +55,7 @@ void asmlinkage __attribute__((weak)) early_printk(const char *fmt, ...)
 {
 }

-#define __LOG_BUF_LEN  (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_BUF_LEN  (1 << 17)

Command i use to write into a file:

cat "/proc/kmsg">/sdcard/klog.txt

Only While debugging, I'm okay if the performance is degraded in my driver but, i don't want drop any logs. I understand we can't make work queues/threads to wait until printing completes. But, still is there any way to get guaranteed that, logs are not dropped.

Answer

Changbin Du picture Changbin Du · Apr 1, 2017

Easy, just add "printk.synchronous=1" to kernel cmdline.

The printk() has became completely async now. You can add printk.synchronous=1 to let printk() be sync.

Refer to the patch "printk: Make printk() completely async":

Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long.

This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and wake_up()s a special dedicated kthread to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress.

Async printk, for the time being, is considered to be less reliable than the synchronous one, so by default we keep printk operating in synchronous mode. There is a printk.synchronous kernel parameter which permits to select sync/async mode as a boot parameter or later on from user space via sysfs knob.

printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory), thus printk() uses its own dedicated printing kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit set we potentially can receive delays in printing until workqueue declares a ->mayday, as noted by Tetsuo Handa).