Monitoring Kubernetes InitContainers with Prometheus

5 months ago Popular post!

1 min read

Kubernetes InitContainers are a neat way to run arbitrary code before your container starts. It ensures that certain pre-conditions are met before your app is up and running. For example it allows you to:

  • run database migrations with Django or Rails before your app starts
  • ensure a microservice or API you depend on to is running

Unfortunately InitContainers can fail and when that happens you probably want to be notified because your app will never start. Kube-state-metrics exposes plenty of Kubernetes cluster metrics for Prometheus. Combining the two we can monitor and alert whenever we discover container problems. Recently, a pull-request was merged that provides InitContainer data.

The metric kube_pod_init_container_status_last_terminated_reason tells us why a specific InitContainer failed to run; whether it's because it timed out or ran into errors.

To use the InitContainer metrics deploy Prometheus and kube-state-metrics. Then target the metrics server in your Prometheus scrape_configs to ensure we're pulling all the cluster metrics into Prometheus:

- job_name: 'kube-state-metrics'
    - targets: ['kube-state-metrics:8080']

kube_pod_init_container_status_last_terminated_reason contains the metric label reason that can be in five different states:

  • Completed
  • OOMKilled
  • Error
  • ContainerCannotRun
  • DeadlineExceeded

We want to be alerted whenever a metric that is not 'Completed' is scraped because that means an InitContainer has failed to run. Here is an example alerting rule.

  - name: Init container failure
      - alert: InitContainersFailed
        expr: kube_pod_init_container_status_last_terminated_reason{reason!="Completed"} == 1
          summary: '{{ $labels.container }} init failed'
          description: '{{ $labels.container }} has not completed init containers with the reason {{ $labels.reason }}'

Happy monitoring!

Similar Blogs

7 months ago
mailgun statuscake terraform cloudflare devops s3 rds django

Kickstarting Infrastructure for Django Applications with Terraform

8 min read

When creating Django applications or using cookiecutters as Django Cookiecutter you will have by default a number of dependencies that will be needed to be created as a S3 bucket, a Postgres Database and a …

8 months ago Popular post!
devops gitlab-ci kaniko automation ci/cd

Creating templates for Gitlab CI Jobs

4 min read

Writing Gitlab CI templates becomes repetitive when you have similar applications running the same jobs. If a change to a job is needed it will be most likely needed to do the same change in …

5 months ago
terraform ci makisu devops gitlab-ci kaniko

6 ways to speed up your CI

8 min read

Waiting for CI to finish slows down development and can be extremely annoying, especially when CI fails and you have to run it again. Let's take a look into approaches on how to speed up …