how to shield a cpu from the linux scheduler (prevent it scheduling threads onto that cpu)?

Steve Lorimer picture Steve Lorimer · Jun 20, 2012 · Viewed 22.8k times · Source

It is possible to use sched_setaffinity to pin a thread to a cpu, increasing performance (in some situations)

From the linux man page:

Restricting a process to run on a single CPU also avoids the performance cost caused by the cache invalidation that occurs when a process ceases to execute on one CPU and then recommences execution on a different CPU

Further, if I desire a more real-time response, I can change the scheduler policy for that thread to SCHED_FIFO, and up the priority to some high value (up to sched_get_priority_max), meaning the thread in question should always pre-empt any other thread running on its cpu when it becomes ready.

However, at this point, the thread running on the cpu which the real-time thread just pre-empted will possibly have evicted much of the real-time thread's level-1 cache entries.

My questions are as follows:

  1. Is it possible to prevent the scheduler from scheduling any threads onto a given cpu? (eg: either hide the cpu completely from the scheduler, or some other way)
  2. Are there some threads which absolutely have to be able to run on that cpu? (eg: kernel threads / interrupt threads)
  3. If I need to have kernel threads running on that cpu, what is a reasonable maximum priority value to use such that I don't starve out the kernel threads?

Answer

Steve Lorimer picture Steve Lorimer · Oct 25, 2012

The answer is to use cpusets. The python cpuset utility makes it easy to configure them.

Basic concepts

3 cpusets

  • root: present in all configurations and contains all cpus (unshielded)
  • system: contains cpus used for system tasks - the ones which need to run but aren't "important" (unshielded)
  • user: contains cpus used for "important" tasks - the ones we want to run in "realtime" mode (shielded)

The shield command manages these 3 cpusets.

During setup it moves all movable tasks into the unshielded cpuset (system) and during teardown it moves all movable tasks into the root cpuset. After setup, the subcommand lets you move tasks into the shield (user) cpuset, and additionally, to move special tasks (kernel threads) from root to system (and therefore out of the user cpuset).

Commands:

First we create a shield. Naturally the layout of the shield will be machine/task dependent. For example, say we have a 4-core non-NUMA machine: we want to dedicate 3 cores to the shield, and leave 1 core for unimportant tasks; since it is non-NUMA we don't need to specify any memory node parameters, and we leave the kernel threads running in the root cpuset (ie: across all cpus)

$ cset shield --cpu 1-3

Some kernel threads (those which aren't bound to specific cpus) can be moved into the system cpuset. (In general it is not a good idea to move kernel threads which have been bound to a specific cpu)

$ cset shield --kthread on

Now let's list what's running in the shield (user) or unshielded (system) cpusets: (-v for verbose, which will list the process names) (add a 2nd -v to display more than 80 characters)

$ cset shield --shield -v
$ cset shield --unshield -v -v

If we want to stop the shield (teardown)

$ cset shield --reset

Now let's execute a process in the shield (commands following '--' are passed to the command to be executed, not to cset)

$ cset shield --exec mycommand -- -arg1 -arg2

If we already have a running process which we want to move into the shield (note we can move multiple processes by passing a comma separated list, or ranges (any process in the range will be moved, even if there are gaps))

$ cset shield --shield --pid 1234
$ cset shield --shield --pid 1234,1236
$ cset shield --shield --pid 1234,1237,1238-1240

Advanced concepts

cset set/proc - these give you finer control of cpusets

Set

Create, adjust, rename, move and destroy cpusets

Commands

Create a cpuset, using cpus 1-3, use NUMA node 1 and call it "my_cpuset1"

$ cset set --cpu=1-3 --mem=1 --set=my_cpuset1

Change "my_cpuset1" to only use cpus 1 and 3

$ cset set --cpu=1,3 --mem=1 --set=my_cpuset1

Destroy a cpuset

$ cset set --destroy --set=my_cpuset1

Rename an existing cpuset

$ cset set --set=my_cpuset1 --newname=your_cpuset1

Create a hierarchical cpuset

$ cset set --cpu=3 --mem=1 --set=my_cpuset1/my_subset1

List existing cpusets (depth of level 1)

$ cset set --list

List existing cpuset and its children

$ cset set --list --set=my_cpuset1

List all existing cpusets

$ cset set --list --recurse

Proc

Manage threads and processes

Commands

List tasks running in a cpuset

$ cset proc --list --set=my_cpuset1 --verbose

Execute a task in a cpuset

$ cset proc --set=my_cpuset1 --exec myApp -- --arg1 --arg2

Moving a task

$ cset proc --toset=my_cpuset1 --move --pid 1234
$ cset proc --toset=my_cpuset1 --move --pid 1234,1236
$ cset proc --toset=my_cpuset1 --move --pid 1238-1340

Moving a task and all its siblings

$ cset proc --move --toset=my_cpuset1 --pid 1234 --threads

Move all tasks from one cpuset to another

$ cset proc --move --fromset=my_cpuset1 --toset=system

Move unpinned kernel threads into a cpuset

$ cset proc --kthread --fromset=root --toset=system

Forcibly move kernel threads (including those that are pinned to a specific cpu) into a cpuset (note: this may have dire consequences for the system - make sure you know what you're doing)

$ cset proc --kthread --fromset=root --toset=system --force

Hierarchy example

We can use hierarchical cpusets to create prioritised groupings

  1. Create a system cpuset with 1 cpu (0)
  2. Create a prio_low cpuset with 1 cpu (1)
  3. Create a prio_met cpuset with 2 cpus (1-2)
  4. Create a prio_high cpuset with 3 cpus (1-3)
  5. Create a prio_all cpuset with all 4 cpus (0-3) (note this the same as root; it is considered good practice to keep a separation from root)

To achieve the above you create prio_all, and then create subset prio_high under prio_all, etc

$ cset set --cpu=0 --set=system
$ cset set --cpu=0-3 --set=prio_all
$ cset set --cpu=1-3 --set=/prio_all/prio_high
$ cset set --cpu=1-2 --set=/prio_all/prio_high/prio_med
$ cset set --cpu=1 --set=/prio_all/prio_high/prio_med/prio_low