Saturday, October 24, 2009

Kernel Wait Queues


One problem that might arise with read operation is what to do when there’s no data yet, but we’re not at end-of-file. The default answer is ‘‘go to sleep waiting for data.’’ This section shows how a process is put to sleep, how it is awakened, and how an application can ask if there is data without just blindly issuing a read call and blocking. We then apply the same concepts to write.

Whenever a process must wait for an event (such as the arrival of data or the termination of a process), it should go to sleep. Sleeping causes the process to suspend execution, freeing the processor for other uses. At some future time, when the event being waited for occurs, the process will be woken up and will continue with its job.

There are several ways of handling sleeping and waking up in Linux, each suited to different needs. All, however, work with the same basic data type, a wait queue (wait_queue_head_t). A wait queue is exactly that—a queue of processes that are waiting for an event. Wait queues are declared and initialized as follows:

wait_queue_head_t my_queue;
init_waitqueue_head (&my_queue);


When a wait queue is declared statically (i.e., not as an automatic variable of a procedure or as part of a dynamically-allocated data structure), it is also possible to initialize the queue at compile time:

DECLARE_WAIT_QUEUE_HEAD (my_queue);

It is a common mistake to neglect to initialize a wait queue (especially since earlier versions of the kernel did not requir e this initialization); if you forget, the results will usually not be what you intended. Once the wait queue is declared and initialized, a process may use it to go to sleep. Sleeping is accomplished by calling one of the variants of sleep_on, depending on how deep a sleep is called for.

sleep_on(wait_queue_head_t *queue);

Puts the process to sleep on this queue. sleep_on has the disadvantage of not being interruptible; as a result, the process can end up being stuck (and unkillable) if the event it’s waiting for never happens.

interruptible_sleep_on(wait_queue_head_t *queue);

The interruptible variant works just like sleep_on, except that the sleep can be interrupted by a signal. This is the form that device driver writers have been using for a long time, before wait_event_interruptible appeared.

sleep_on_timeout(wait_queue_head_t *queue, long timeout);
interruptible_sleep_on_timeout(wait_queue_head_t *queue, long timeout);


These two functions behave like the previous two, with the exception that the sleep will last no longer than the given timeout period. The timeout is specified in ‘‘jiffies’’.

void wait_event(wait_queue_head_t queue, int condition);
int wait_event_interruptible(wait_queue_head_t queue, int condition);


These macros are the preferred way to sleep on an event. They combine waiting for an event and testing for its arrival in a way that avoids race conditions. They will sleep until the condition, which may be any boolean C expression, evaluates true. The macros expand to a while loop, and the condition is reevaluated over time—the behavior is different from that of a function call or a simple macro, where the arguments are evaluated only at call time. The latter macro is implemented as an expression that evaluates to 0 in case of success and -ERESTARTSYS if the loop is interrupted by a signal. It is worth repeating that driver writers should almost always use the interruptible instances of these functions/macros. The noninterruptible version exists for the small number of situations in which signals cannot be dealt with, for example, when waiting for a data page to be retrieved from swap space. Most drivers do not present such special situations. Of course, sleeping is only half of the problem; something, somewhere will have to wake the process up again. When a device driver sleeps directly, there is usually code in another part of the driver that performs the wakeup, once it knows that the event has occurred. Typically a driver will wake up sleepers in its interrupt handler once new data has arrived. Other scenarios are possible, however. Just as there is more than one way to sleep, so there is also more than one way to wake up. The high-level functions provided by the kernel to wake up processes are as follows:.

wake_up(wait_queue_head_t *queue);
This function will wake up all processes that are waiting on this event queue.

wake_up_interruptible(wait_queue_head_t *queue);
wake_up_interruptible wakes up only the processes that are in interruptible
sleeps. Any process that sleeps on the wait queue using a noninterruptible
function or macro will continue to sleep.

wake_up_sync(wait_queue_head_t *queue);

wake_up_interruptible_sync(wait_queue_head_t *queue);


Normally, a wake_up call can cause an immediate reschedule to happen, meaning that other processes might run before wake_up retur ns. The “synchronous” variants instead make any awakened processes runnable, but do not reschedule the CPU. This is used to avoid rescheduling when the current process is known to be going to sleep, thus forcing a reschedule anyway. Note that awakened processes could run immediately on a different processor, so these functions should not be expected to provide mutual exclusion. If your driver is using interruptible_sleep_on, there is little difference between wake_up and wake_up_interruptible. Calling the latter is a common convention, however, to preserve consistency between the two calls.

As an example of wait queue usage, imagine you want to put a process to sleep when it reads your device and awaken it when someone else writes to the device.

The following code does just that:

DECLARE_WAIT_QUEUE_HEAD(wq);

ssize_t sleepy_read (struct file *filp, char *buf, size_t count,
loff_t *pos)
{
printk(KERN_DEBUG "process %i (%s) going to sleep\n",
current->pid, current->comm);
interruptible_sleep_on(&wq);
printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm);
return 0; /* EOF */
}

ssize_t sleepy_write (struct file *filp, const char *buf, size_t count,
loff_t *pos)
{
printk(KERN_DEBUG "process %i (%s) awakening the readers...\n",
current->pid, current->comm);
wake_up_interruptible(&wq);
return count; /* succeed, to avoid retrial */
}

An important thing to remember with wait queues is that being woken up does not guarantee that the event you were waiting for has occurred; a process can be woken for other reasons, mainly because it received a signal. Any code that sleeps should do so in a loop that tests the condition after returning from the sleep.

2 comments:

  1. thanks for precise description with examples

    ReplyDelete
  2. Is wait_queue part of task_struct else where they are stored ?

    ReplyDelete