Monitoring CPU Utilization using Prometheus

Arnav Bose picture Arnav Bose · Feb 21, 2018 · Viewed 9.6k times · Source

I am trying to monitor the cpu utilization of the machine in which Prometheus is installed and running. I have a metric 'process_cpu_seconds_total'. I can find irate or rate of this metric. But I am not too sure how to come up with the percentage value for CPU utilization. Is there anyway I can use this process_cpu_seconds_total metric to find the CPU utilization of the machine where Prometheus runs?

Answer

lambfrier picture lambfrier · Feb 15, 2019

A late answer for others' benefit too:

If you're wanting to just monitor the percentage of CPU that the prometheus process uses, you can use process_cpu_seconds_total, e.g. something like:

avg by (instance) (irate(process_cpu_seconds_total{job="prometheus"}[1m]))

However, if you want a general monitor of the machine CPU as I suspect you might be, you should set-up Node exporter and then use a similar query to the above, with the metric node_cpu_seconds_total. E.g.:

avg by (instance,mode) (irate(node_cpu_seconds_total{mode!='idle'}[1m]))

The rate or irate are equivalent to the percentage (out of 1) since they are how many seconds used of a second, but usually need to be aggregated across cores/cpus on the machine.
Brian Brazil's post on Prometheus CPU monitoring is very relevant and useful: https://www.robustperception.io/understanding-machine-cpu-usage