How can I alert for container restarted?

qingsong picture qingsong · Jan 3, 2017 · Viewed 14.6k times · Source

I like to monitor the containers using Prometheus and cAdvisor so that when a container restart, I get an alert. I wonder if anyone have sample Prometheus alert for this.

Answer

I used the following Prometheus alert rule for finding container restarts in an hour(can be modified to max time), It may be helpful for you.

Prometheus Alert Rule Sample

ALERT ContainerRestart/PodRestart
IF rate(kube_pod_container_status_restarts[1h]) * 3600 > 1
FOR 5s
LABELS {action_required = "true", severity="critical/warning/info"}
ANNOTATIONS {DESCRIPTION="Pod {{$labels.namespace}}/{{$labels.pod}} restarting more than once during last one hours.",
SUMMARY="Container {{ $labels.container }} in Pod {{$labels.namespace}}/{{$labels.pod}} restarting more than once times during last one hours."}

rate()

rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. Also, the calculation extrapolates to the ends of the time range, allowing for missed scrapes or imperfect alignment of scrape cycles with the range's time period. The following example expression returns the per-second rate of HTTP requests as measured over the last 5 minutes, per time series in the range vector:

rate(http_requests_total{job="api-server"}[5m])

rate should only be used with counters. It is best suited for alerting, and for graphing of slow-moving counters.

Note that when combining rate() with an aggregation operator (e.g. sum()) or a function aggregating over time (any function ending in _over_time), always take a rate() first, then aggregate. Otherwise rate() cannot detect counter resets when your target restarts.

kube_pod_container_status_restarts_total

Metric Type: Counter

Labels/Tags: container=container-name, namespace=pod-namespace,pod=pod-name

Description: The number of container restarts per pod