I am trying to monitor the cpu utilization of the machine in which Prometheus is installed and running. I have a metric 'process_cpu_seconds_total'. I can find irate or rate of this metric. But I am not too sure how to come up with the percentage value for CPU utilization. Is there anyway I can use this process_cpu_seconds_total metric to find the CPU utilization of the machine where Prometheus runs?
A late answer for others' benefit too:
If you're wanting to just monitor the percentage of CPU that the prometheus process uses, you can use process_cpu_seconds_total
, e.g. something like:
avg by (instance) (irate(process_cpu_seconds_total{job="prometheus"}[1m]))
However, if you want a general monitor of the machine CPU as I suspect you might be, you should set-up Node exporter and then use a similar query to the above, with the metric node_cpu_seconds_total
. E.g.:
avg by (instance,mode) (irate(node_cpu_seconds_total{mode!='idle'}[1m]))
The rate
or irate
are equivalent to the percentage (out of 1) since they are how many seconds used of a second, but usually need to be aggregated across cores/cpus on the machine.
Brian Brazil's post on Prometheus CPU monitoring is very relevant and useful: https://www.robustperception.io/understanding-machine-cpu-usage