Which real-time priority is the highest priority in Linux

David Steinhauer picture David Steinhauer · Jan 17, 2012 · Viewed 89.9k times · Source

In the Linux real-time process priority range 1 to 99, it's unclear to me which is the highest priority, 1 or 99.

Section 7.2.2 of "Understanding the Linux Kernel" (O'Reilly) says 1 is the highest priority, which makes sense considering that normal processes have static priorities from 100 to 139, with 100 being the highest priority:

"Every real-time process is associated with a real-time priority, which is a value ranging from 1 (highest priority) to 99 (lowest priority). "

On the other hand, the sched_setscheduler man page (RHEL 6.1) claims that 99 is the highest:

"Processes scheduled under one of the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority value in the range 1 (low) to 99 (high)."

Which is the highest real-time priority?

Answer

David Steinhauer picture David Steinhauer · Feb 14, 2012

I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
        BUG_ON(p->se.on_rq);

        p->policy = policy;
        p->rt_priority = prio;
        p->normal_prio = normal_prio(p);
        /* we are holding p->pi_lock already */
        p->prio = rt_mutex_getprio(p);
        if (rt_prio(p->prio))
                p->sched_class = &rt_sched_class;
        else
                p->sched_class = &fair_sched_class;
        set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().