Ingress-Nginx Monitoring with Prometheus and Grafana

Ingress-nginx provides an easy integration with Prometheus for monitoring. However, it can be challenging to get started with monitoring Ingress-nginx and creating dashboards and alerts. Therefore, I’ve created a monitoring mixin for Ingress-nginx which will provide Prometheus alerts and Grafana dashboards focusing on Ingress-nginx.

You can find the source code to the alerts and dashboard in github/ingress-nginx-mixin.

There are two dashboards available:

Ingress-nginx Overview - An overview of Ingress-nginx request metrics, controller status, SSL certificates.
Ingress-nginx Request Handling Performance - An detailed view of request metrics filterable by ingress.

There are also Prometheus alerts stored in GitHub that you can import that alert on request failures and controller failures.

The dashboards and alerts are work in progress, and feel free to share feedback in the ingress-nginx-mixin repository of what you would like to see or any issues you experience.

If you want to go directly to the dashboards you can use the links above, the rest of the blog post will guide you on how to enable metrics and describe the various alerts and dashboards.

Enabling Ingress-nginx Metrics

Ingress-nginx provides Prometheus metrics out of the box, and you can enable them by setting the following values in your Helm chart values file.

controller:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true

The above configuration will enable the metrics and also create a ServiceMonitor for Prometheus Operator to scrape the metrics. Adjust the way you scrape metrics according to your setup if you do not use the Prometheus Operator. The metrics are available on the port 10254 and the path /metrics of the Ingress-nginx controller.

Grafana Dashboards

There are 2 dashboards, and they are split as otherwise there would be many graphs in one dashboard, filters would be applicable for a portion of the panels as not all metrics contain the filtered labels making it unclear when they apply and some expensive metrics would put high pressure on your Prometheus backend.

The upcoming sections will describe each dashboard.

Ingress-nginx Overview Dashboard

The Ingress-nginx overview dashboard focuses on providing an overview of the request metrics, controller status and SSL certificates. The following things are core for the dashboard:

Controller - Provides a section that summarizes requests by controller and the controller configuration status.
Ingress - Provides a section that displays ingress request volume, request success rates and request duration.
Certificates - Provides a section that displays SSL certificate expiry date.

ingress-nginx-overview

Ingress-nginx Request Handling Performance

The Ingress-nginx request handling performance dashboard focuses on providing detailed insight to request metrics. The following things are core for the dashboard:

Ingress Response Times - Provides a section that displays graphs for total request time and upstream response time.
Ingress Paths - Provides a section that does a breakdown of request metrics by ingress path. However, metrics for each path is disabled by default for Ingress-nginx due to the high metric cardinality it causes. Therefore, you might only see a single path which is /.

ingress-nginx-mixin-request-handling-performancance

Prometheus Alerts

Alerts are tricky to get right for a generic use case, however, they are still provided by the ingress-nginx-mixin. They are also configurable with the config.libsonnet package in the repository, if you are familiar with Jsonnet then customizing the alerts should be fairly straight forward. The alerts can be found on GitHub, and I’ll add a description for the alerts below.

Adjust any of the alerts and add any new ones that you require. Open issues and share feedback in the GitHub repository!

Application Alerts

Alert name: NginxConfigReloadFailed

Alerts when an Ingress-nginx configuration reload failed.

Alert name: NginxHighHttp4xxErrorRate

Alerts when an Ingress-nginx ingress has a higher 4xx rate than 5% of the total requests in the past 5 minutes.

Alert name: NginxHighHttp5xxErrorRate

Alerts when an Ingress-nginx ingress has a higher 5xx rate than 5% of the total requests in the past 5 minutes.

Note: Remember to adjust the thresholds according to your setup which can be done in the config.libsonnet file before generating the alerts. You can also mute or lower severity for alerts using the config.

Summary

The Ingress-nginx mixin provides Prometheus alerts and Grafana dashboards for monitoring Ingress-nginx. The dashboards provide an overview of the request metrics, controller status and SSL certificates. The request handling performance dashboard provides detailed insight into request metrics. The alerts are generic and can be adjusted to your setup. The mixin is a work in progress, and feedback is welcome in the ingress-nginx-mixin repository.

ArgoCD has by default support for notifying when an Application does not have the desired status through triggers. For example, when an application becomes OutOfSync or Unhealthy, a notification is sent to your configured notification service (e.g. Slack). This was my initial setup, but I found it to be flaky, where networking issues between the server and controller for a couple of seconds would send many Slack messages that the Application status is unknown. An application becoming unhealthy would instantly send alerts to Slack. To resolve this I wanted interval based alerts and as usual Prometheus was the solution to this. ArgoCD provides Prometheus metrics out of the box, and alongside the metrics there’s a Grafana dashboard for ArgoCD. The dashboard is good, but the project is lacking any open source alerting. Even more so, it does not have a monitoring mixin for providing dashboards and alerts to be consumed easily.