How to understand CPU allocation in Mesos?

Nan Xiao picture Nan Xiao · Dec 24, 2015 · Viewed 7.2k times · Source

I am reading Building Applications on Mesos, and come across the following statements:

cpus
This resource expresses how many CPU cores are available. Tasks may use fractional parts of a CPU—this is possible because Mesos slaves use CPU shares, rather than reserving specific CPUs. This means that, if you have 1.5 cpus reserved, your processes will be allowed to use a total of 1.5 seconds of CPU time each second. That could mean that, within a single executor, two processes each get 750 milliseconds of CPU time per second, or one process gets 1 second of CPU time and another gets 500 milliseconds of CPU time each in a given second. The benefit of using CPU shares is that if some task would be able to utilize more than its share, and no other task would use an otherwise idle CPU, the first task can potentially use more than its share. As a result, the cpus reserved provides a guaranteed minimum of CPU time available to the task—if additional capacity is available, it will be allowed to use more.

I can't understand "if you have 1.5 cpus reserved, your processes will be allowed to use a total of 1.5 seconds of CPU time each second.". How can it use 1.5 seconds of CPU time each second?

Answer

Tombart picture Tombart · Sep 8, 2016

cpu=1.5 should stand for one and half CPU core. You can see in Mesos Web UI how many cores each Mesos agent (slave) offers. That's pretty much what nproc shows unless mesos-slave is configured to offer less CPUs. Mesos counts resources with 3 decimal places precision.

There are several flags that influence the way how Mesos limits resources. For CPU is most important isolation (we're talking about mesos-slave/mesos-agent settings):

  • --isolation=posix/cpu,posix/mem None CPU limiting is applied mesos-executor is just a process that runs other process. You can use nice, e.g. nice -20 (for highest priority) or cpulimit commands to influence kernel planning, but Mesos's e.g. cpu=0.1 won't be taken into consideration.
  • --isolation=cgroups/cpu,cgroups/mem cgroups (part of Linux Kernel since 2.6.29) allows limiting resources used by each process or group of processes. Some distributions does not enable memory limiting by default and cgroup_enable=memory need to be passed to the kernel. But let's focus on CPU. By default cgroups takes conservative approach where cpu=1.0 means that at least one CPU core will be reserved for the task. But in case that there is no other task running on the host it can consume all of the CPUs. Assuming that we have a host with 12 CPUs and there are two tasks running with cpu=2.0. Then each task might get up to 6 CPUs cores! (assuming no other Mesos task is running on that host). This is very dangerous, when cluster is at low load all tasks will look fine, but once there are many tasks performance of some hosts will decrease.
    • --cgroups_enable_cfs CFS stands for Completely Fair Scheduler which takes more strict approach. By default it is turned off, also not all distributions support this (you can use e.g. Docker's check-script.sh to verify support on your system). CFS will guarantee that each process can use at most the portion specified (e.g. cpu=2.5). This comes at a cost that no other process can utilize reserved cores when some task is idle. So, make sure you'll define your requirement well.

Last mentioned issue could be solved by CPU oversubscription that is described in the Mesos documentation.