What does ERESTARTSYS used while writing linux driver?

I'm a frog dragon picture I'm a frog dragon · Mar 6, 2012 · Viewed 23.7k times · Source

I'm learning about the blocking I/O functions for writing linux device driver and I'm wondering what is the usage of ERESTARTSYS. Consider the following:

Global variable :

wait_queue_head_t my_wait_q_head;
int read_avail = 0;


device_init() :

init_waitqueue_head(&my_wait_q_head);


device_read():

printk("I'm inside driver read!\n");
wait_event_interruptible(&my_wait_q_head, read_avail != 0);
printk("I'm awaken!\n");


device_write():

read_avail = 1;
wake_up_interruptible(&my_wait_q_head);


When I call the read() from user space, the command prompt hang until I call the write() as expected. The printk messages appear accordingly as well in dmesg. However, I'm seeing some of the drivers written like this :

Another version of device_read():

printk("I'm inside driver read!\n");
if(wait_event_interruptible(&my_wait_q_head, read_avail != 0))    
{return -ERESTARTSYS;}
printk("I'm awaken!\n");

I tested the second version of device_read() using the same method in user space, and the result is exactly the same, so, what's the use of ERESTARTSYS?

p/s: I've read the book Linux Device Driver on this but I don't get it, can someone give an example to eleborate?:

Once we get past that call, something has woken us up, but we do not know what. One possibility is that the process received a signal. The if statement that contains the wait_event_interruptible call checks for this case. This statement ensures the proper and expected reaction to signals, which could have been responsible for waking up the process (since we were in an interruptible sleep). If a signal has arrived and it has not been blocked by the process, the proper behavior is to let upper layers of the kernel handle the event. To this end, the driver returns -ERESTARTSYS to the caller; this value is used internally by the virtual filesystem (VFS) layer, which either restarts the system call or returns -EINTR to user space. We use the same type of check to deal with signal handling for every read and write implementation.

Source : http://www.makelinux.net/ldd3/chp-6-sect-2

Answer

Kaz picture Kaz · Mar 6, 2012

-ERESTARTSYS is connected to the concept of a restartable system call. A restartable system call is one that can be transparently re-executed by the kernel when there is some interruption.

For instance the user space process which is sleeping in a system call can get a signal, execute a handler, and then when the handler returns, it appears to go back into the kernel and keeps sleeping on the original system call.

Using the POSIX sigaction API's SA_RESTART flag, processes can arrange the restart behavior associated with signals.

In the Linux kernel, when a driver or other module blocking in the context of a system call detects that a task has been woken because of a signal, it can return -EINTR. But -EINTR will bubble up to user space and cause the system call to return -1 with errno set to EINTR.

If you return -ERESTARTSYS instead, it means that your system call is restartable. The ERESTARTSYS code will not necessarily be seen in user space. It either gets translated to a -1 return and errno set to EINTR (then, obviously, seen in user space), or it is translated into a system call restart behavior, which means that your syscall is called again with the same arguments (by no action on part of the user space process: the kernel does this by stashing the info in a special restart block).

Note the obvious problem with "same arguments" in the previous paragraph: some system calls can't be restarted with the same parameters, because they are not idempotent! For instance, suppose there is a sleep call like nanosleep, for 5.3 seconds. It gets interrupted after 5 seconds. If it restarts naively, it will sleep for another 5.3 seconds. It has to pass new parameters to the restarted call to sleep for only the remaining 0.3 seconds; i.e. alter the contents of the restart block. There is a way to do that: you stuff different arguments into the task's restart block and use the -ERESTART_RESTARTBLOCK return value.

To address the second question: what's the difference? Why not just write the read routine without checking the return value and returning -ERESTARTSYS? Well, because that is incorrect in the case that the wakeup is due to a signal! Do you want a read to return 0 bytes read whenever a signal arrives? That could be misinterpreted by user space as end of data. This kind of problem won't show up in test cases that don't use signals.