I have a question about docker stats command if anyone can help me. I am new in Docker area and I want to monitor the cpu usage of a docker container.
The physical machine has 8 cores (CPU0...CPU7). I already create a container and limit its cpu resource to 1 core (CPU0) using the following command: docker run -itd --cpuset-cpus=0 -p 8081:8080 binfalse/bives-webapp
I stress the container by sending requests from Jmeter and then monitor the cpu usage of the container via docker stats command which gives me values greater than 100%.
I don't understand why it gives more than 100% even if only one core is allocated to the container!. Do you have any idea about the reason? Does this cpu value represents the cpu usage of some system processes in addition to the container?
Thanks in advance for your help.
docker version: Client: Version: 17.06.0-ce API version: 1.30 Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:23:31 2017 OS/Arch: linux/amd64
Server: Version: 17.06.0-ce API version: 1.30 (minimum version 1.12) Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:19:04 2017 OS/Arch: linux/amd64 Experimental: true
docker info result: Containers: 2 Running: 1 Paused: 0 Stopped: 1 Images: 10 Server Version: 17.06.0-ce Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 141 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4 init version: 949e6fa Security Options: apparmor seccomp Profile: default Kernel Version: 4.4.0-98-generic Operating System: Ubuntu 16.04.2 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.56GiB Name: logti048131 ID: RHOG:IR6N:FVC4:YDI5:A6T4:QA4Y:DDYF:7HZN:AI3L:WVLE:BNHY:6YNV Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: true Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: No swap limit support
On Linux, cgroups and Docker CPU stats deal in "time slices" of CPU, the number of nanoseconds the CPU has been in use for. To get the percentage, the container cgroup value of "time used" is compared to the overall system value for "time available" from /proc/stat
.
Due to the stored "time slice" values being cumulative, the current values are compared to the previous collected values to get a more instantaneous percentage. I think this comparison is the basis of the issue.
The docker stats
command actually does a lot of the leg work for this information in the client. The client queries for all containers, watches events for container start/stops and opens an individual stats stream for each running container. These streams of container stats are used to calculate the percentages on each dump of stats data from a stream.
For the container stats stream, the Docker daemon collects the systems used cpu time first. It then uses libcontainer to read in a containers cgroup files and parse the text into values. Here are all the stats data structures. That is all then sent to the client as a JSON response for processing.
I believe at least part of the problem stems from reading and parsing the /proc/stat
system information and container cgroup stats at different times. Every time the goroutine that reads the container info is delayed a bit, more nanoseconds are included in that sample compared to the system. As the collection process is scheduled to run every X seconds, the next read then includes less total nanoseconds so the values can bounce up on a busy system, then back down the same amount as there is not a full 'tick' included in the second sample.
The issue compounds the more containers you run and the busier the system gets. The stats gathering and forwarding to the client seems to be a relatively heavyweight process, just docker stats
with a large number of containers is enough to cause more inaccuracy. My best guess is contention in the goroutines that are all trying to read the stats. I'm not sure that would account for quite the level of inaccuracy the Docker shows. I'm either completely wrong or there's something else adding to the problem.
Each Docker containers usage is tracked in a cgroup. The CPU accounting information can be viewed via the cgroup file system:
→ find /sys/fs/cgroup/cpuacct/docker -type d
/sys/fs/cgroup/cpuacct/docker
/sys/fs/cgroup/cpuacct/docker/f0478406663bb57d597d4a63a031fc2e841de279a6f02d206b27eb481913c0ec
/sys/fs/cgroup/cpuacct/docker/5ac4753f955acbdf38beccbcc273f954489b2a00049617fdb0f9da6865707717
/sys/fs/cgroup/cpuacct/docker/a4e00d69819a15602cbfb4f86028a4175e16415ab9e2e9a9989fafa35bdb2edf
/sys/fs/cgroup/cpuacct/docker/af00983b1432d9ffa6de248cf154a1f1b88e6b9bbebb7da2485d94a38f9e7e15
→ cd /sys/fs/cgroup/cpuacct/docker/f0478406663bb57d597d4a63a031fc2e841de279a6f02d206b27eb481913c0ec
→ ls -l
total 0
-rw-r--r-- 1 root root 0 Nov 20 22:31 cgroup.clone_children
-rw-r--r-- 1 root root 0 Nov 20 04:35 cgroup.procs
-r--r--r-- 1 root root 0 Nov 20 21:51 cpuacct.stat
-rw-r--r-- 1 root root 0 Nov 20 21:51 cpuacct.usage
-r--r--r-- 1 root root 0 Nov 20 22:31 cpuacct.usage_all
-r--r--r-- 1 root root 0 Nov 20 21:51 cpuacct.usage_percpu
-r--r--r-- 1 root root 0 Nov 20 22:31 cpuacct.usage_percpu_sys
-r--r--r-- 1 root root 0 Nov 20 22:31 cpuacct.usage_percpu_user
-r--r--r-- 1 root root 0 Nov 20 22:31 cpuacct.usage_sys
-r--r--r-- 1 root root 0 Nov 20 22:31 cpuacct.usage_user
-rw-r--r-- 1 root root 0 Nov 20 22:31 notify_on_release
-rw-r--r-- 1 root root 0 Nov 20 22:31 tasks
→ cat cpuacct.usage_percpu
3625488147 6265485043 6504277830
Each value is the cumulative usage in nano seconds on that CPU.
→ grep -w ^cpu /proc/stat
cpu 475761 0 10945 582794 2772 0 159 0 0 0
Values here are USER_HZ
== 1/100 of a second, so get some conversion in Docker.