How does sched_setaffinity() work?

poundifdef picture poundifdef · Apr 20, 2009 · Viewed 10.4k times · Source

I am trying to understand how the linux syscall sched_setaffinity() works. This is a follow-on from my question here.

I have this guide, which explains how to use the syscall and has a pretty neat (working!) example.

So I downloaded the Linux 2.6.27.19 kernel sources.

I did a 'grep' for lines containing that syscall, and I got 91 results. Not promising.

Ultimately, I'm trying to understand how the kernel is able to set the instruction pointer for a specific core (or processor.)

I am familiar with how single-core-single-thread programs work. One might issue a 'jmp foo' instruction, and this basically sets the IP to the memory address of the 'foo' label. But when one has multiple cores, one has to say "fetch the next instruction at memory address foo, and set the instruction pointer for core number 2 to begin execution there."

Where, in the assembly code, are we specifying which core performs that operation?

Back to the kernel code: what is important here? The file 'kernel/sched.c' has a function called sched_setaffinity(), but returns type "long" - which is inconsistent with its manual page. So what is important here? Which of these modules shows the assembly instructions issued? What module is reading the 'task_struct', looking at the 'cpus_allowed' member, and then translating that into an instruction? (I've also thumbed through the glibc source - but I think it just makes a call to the kernel code to accomplish this task.)

Answer

Eduard - Gabriel Munteanu picture Eduard - Gabriel Munteanu · Apr 20, 2009

sched_setaffinity() simply tells the scheduler which CPUs is that process/thread allowed to run on, then calls for a re-schedule.

The scheduler actually runs on each one of the CPUs, so it gets a chance to decide what task to execute next on that particular CPU.

If you're interested in how you can actually call some code on other CPUs, I suggest you take a look at smp_call_function_single(). In case we want to call something on another CPU, this calls generic_exec_single(). The latter simply adds the function to the target CPU's call queue and forces a reschedule through some IPI stuff (if the queue was empty).

Bottom line is: there no actual SMP variant of the _jmp_ instruction. Instead, code running on other CPUs cooperates in order to accomplish the task.