I am reading Building Applications on Mesos, and come across the following statements:
cpus
This resource expresses how many CPU cores are available. Tasks may use fractional parts of a CPU—this is possible because Mesos slaves use CPU shares, rather than reserving specific CPUs. This means that, if you have 1.5 cpus reserved, your processes will be allowed to use a total of 1.5 seconds of CPU time each second. That could mean that, within a single executor, two processes each get 750 milliseconds of CPU time per second, or one process gets 1 second of CPU time and another gets 500 milliseconds of CPU time each in a given second. The benefit of using CPU shares is that if some task would be able to utilize more than its share, and no other task would use an otherwise idle CPU, the first task can potentially use more than its share. As a result, the cpus reserved provides a guaranteed minimum of CPU time available to the task—if additional capacity is available, it will be allowed to use more.
I can't understand "if you have 1.5 cpus reserved, your processes will be allowed to use a total of 1.5 seconds of CPU time each second.
". How can it use 1.5
seconds of CPU
time each second?
cpu=1.5
should stand for one and half CPU core. You can see in Mesos Web UI how many cores each Mesos agent (slave) offers. That's pretty much what nproc
shows unless mesos-slave
is configured to offer less CPUs. Mesos counts resources with 3 decimal places precision.
There are several flags that influence the way how Mesos limits resources. For CPU is most important isolation
(we're talking about mesos-slave
/mesos-agent
settings):
--isolation=posix/cpu,posix/mem
None CPU limiting is applied mesos-executor
is just a process that runs other process. You can use nice, e.g. nice -20
(for highest priority) or cpulimit
commands to influence kernel planning, but Mesos's e.g. cpu=0.1
won't be taken into consideration.--isolation=cgroups/cpu,cgroups/mem
cgroups (part of Linux Kernel since 2.6.29) allows limiting resources used by each process or group of processes. Some distributions does not enable memory limiting by default and cgroup_enable=memory
need to be passed to the kernel. But let's focus on CPU. By default cgroups
takes conservative approach where cpu=1.0
means that at least one CPU core will be reserved for the task. But in case that there is no other task running on the host it can consume all of the CPUs. Assuming that we have a host with 12 CPUs
and there are two tasks running with cpu=2.0
. Then each task might get up to 6 CPUs
cores! (assuming no other Mesos task is running on that host). This is very dangerous, when cluster is at low load all tasks will look fine, but once there are many tasks performance of some hosts will decrease.
--cgroups_enable_cfs
CFS stands for Completely Fair Scheduler which takes more strict approach. By default it is turned off, also not all distributions support this (you can use e.g. Docker's check-script.sh
to verify support on your system). CFS will guarantee that each process can use at most the portion specified (e.g. cpu=2.5
). This comes at a cost that no other process can utilize reserved cores when some task is idle. So, make sure you'll define your requirement well.Last mentioned issue could be solved by CPU oversubscription that is described in the Mesos documentation.