1 min read
Kubernetes InitContainers are a neat way to run arbitrary code before your container starts. It ensures that certain pre-conditions are met before your app is up and running. For example it allows you to:
Unfortunately InitContainers can fail and when that happens you probably want to be notified because your app will never start. Kube-state-metrics exposes plenty of Kubernetes cluster metrics for Prometheus. Combining the two we can monitor and alert whenever we discover container problems. Recently, a pull-request was merged that provides InitContainer data.
The metric kube_pod_init_container_status_last_terminated_reason
tells us why a specific InitContainer failed to run; whether it's because it timed out or ran into errors.
To use the InitContainer metrics deploy Prometheus and kube-state-metrics. Then target the metrics server in your Prometheus scrape_configs to ensure we're pulling all the cluster metrics into Prometheus:
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics:8080']
kube_pod_init_container_status_last_terminated_reason
contains the metric label reason
that can be in five different states:
We want to be alerted whenever a metric that is not 'Completed' is scraped because that means an InitContainer has failed to run. Here is an example alerting rule.
groups:
- name: Init container failure
rules:
- alert: InitContainersFailed
expr: kube_pod_init_container_status_last_terminated_reason{reason!="Completed"} == 1
annotations:
summary: '{{ $labels.container }} init failed'
description: '{{ $labels.container }} has not completed init containers with the reason {{ $labels.reason }}'
Happy monitoring!
5 min read
Kubernetes is an open-source system that's popular amongst developers. It automatically deploys, scales, and manages containerized applications. Yet for those working outside of the traditional startup or corporate setting, Kubernetes can be CPU and memory-intensive, …
4 min read
Persistent volumes in AWS are tied to one Availability Zone(AZ), therefore if you were to create a cluster in an AZ where the volume is not created in you would not be able to use …
4 min read
Apollo GraphQL and NestJS are gaining traction quickly, however the monitoring approaches are unclear. At the moment (late 2021 / early 2022) there are no default exporters or libraries for Prometheus metrics and the same …