In the Linux real-time process priority range 1 to 99, it's unclear to me which is the highest priority, 1 or 99.
Section 7.2.2 of "Understanding the Linux Kernel" (O'Reilly) says 1 is the highest priority, which makes sense considering that normal processes have static priorities from 100 to 139, with 100 being the highest priority:
"Every real-time process is associated with a real-time priority, which is a value ranging from 1 (highest priority) to 99 (lowest priority). "
On the other hand, the sched_setscheduler man page (RHEL 6.1) claims that 99 is the highest:
"Processes scheduled under one of the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority value in the range 1 (low) to 99 (high)."
Which is the highest real-time priority?
I did an experiment to nail this down, as follows:
process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.
process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.
I'm running a 2.6.33 kernel with the PREEMPT_RT patch.
To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.
In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.
This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:
static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
BUG_ON(p->se.on_rq);
p->policy = policy;
p->rt_priority = prio;
p->normal_prio = normal_prio(p);
/* we are holding p->pi_lock already */
p->prio = rt_mutex_getprio(p);
if (rt_prio(p->prio))
p->sched_class = &rt_sched_class;
else
p->sched_class = &fair_sched_class;
set_load_weight(p);
}
rt_mutex_getprio(p) does the following:
return task->normal_prio;
While normal_prio() happens to do the following:
prio = MAX_RT_PRIO-1 - p->rt_priority; /* <===== notice! */
...
return prio;
In other words, we have (my own interpretation):
p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority
Wow! That is confusing! To summarize:
With p->prio, a smaller value preempts a larger value.
With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().