Django Monitoring with Prometheus and Grafana

The Prometheus package for Django provides a great Prometheus integration, but the open source dashboards and alerts that exist are not that great. The to-go Grafana dashboard does not use a large portion of metrics provided by the Django-Prometheus package, alongside this there are no filters for views, methods, jobs and namespaces. This blog post will introduce the Django-mixin - a set of Prometheus rules and Grafana dashboards for Django. The dashboard and alerts will provide insights on applied/unapplied migrations, RED (requests per second, error percentage of the request, latency for each request) metrics, database ops and cache hit rate.

Table of Contents

Setting up Django-Prometheus
Grafana Dashboards
Alerts
Summary

There are already three dashboards that are published in Grafana:

Django Overview - Django Overview, a simple overview of the database, cache and requests.
Django Requests Overview - Django request overview, providing insights of all requests filterable by view and method. Separate graphs for app and admin views, has as well weekly breakdowns for top templates, top exceptions by type, top exceptions by view and top responses by view.
Django Requests by View - Django requests by view, a breakdown of requests by view that shows compute expensive metrics as latency buckets alongside requests, responses and status codes.

There are also Prometheus alerts stored in GitHub that you can import that cover RED metrics and database errors and missing migrations.

The dashboards and alerts are work in progress, and feel free to share feedback in the django-mixin repository of what you would like to see or any issues you experience.

If you want to go directly to the dashboards you can use the links above, the rest of the blog post will describe setting up Django-prometheus and the various alerts and dashboards.

Setting up Django-Prometheus

First, install Django-Prometheus (pip/poetry):

poetry/pip add/install django-prometheus

All the following settings should be located in your settings.py. Add django_prometheus to installed apps:

INSTALLED_APPS = [
    ...
    "django_prometheus",
    ...
]

Add the Django-prometheus request middleware to MIDDLEWARE, ensure that it’s before and after all other middlewares:

MIDDLEWARE = [
    "django_prometheus.middleware.PrometheusBeforeMiddleware",
    ... My other middleware
    "django_prometheus.middleware.PrometheusAfterMiddleware",
]

Change the database and cache backend to use the Django-prometheus backend:

DATABASES["default"]["ENGINE"] = "django_prometheus.db.backends.postgresql" # Adjust according to your database

CACHES = {
    "default": {
        "BACKEND": "django_prometheus.cache.backends.redis.RedisCache",
        ...
    }
}

Ensure we enable migration metrics:

PROMETHEUS_EXPORT_MIGRATIONS = env.bool("PROMETHEUS_EXPORT_MIGRATIONS", True)

Lastly, add the Django-prometheus URLs:

urlpatterns = [
    ...
    path("prometheus/", include("django_prometheus.urls")),
    ...
]

Now you should be able to go to <my-url>/prometheus/metrics and see database, cache and request metrics! Add the target to Prometheus using your preferred approach and move on to the dashboard sections below.

Grafana Dashboards

As mentioned previously, the Django-mixin has three dashboards. A Django overview, a Django request overview and a Django request breakdown by view. The dashboards are split as otherwise there would be many graphs in one dashboard, filters would be applicable for a portion of the panels as not all metrics contain the filtered labels making it unclear when they apply and some expensive metrics would put high pressure on your Prometheus backend.

The upcoming sections will describe each dashboard.

Django Overview Dashboard

The Django overview dashboard focuses on providing an overview of your entire system, with both the cache and database included. The following things are core for the dashboard:

Requests - Provides a section that covers request of your Django application. See the request volume (req/s) and the response status codes (2xx, 3xx, 4xx, 5xx).
Database - Provides a section that covers the database usage of your Django application. See the database operations (ops/s), the database latency (p50, p95, p99, p99.9), the database connections currently open, the total number of migrations applied/unapplied and lastly a weekly breakdown of database errors.
Cache - Provides a section that covers the cache usage of your Django application. See the cache hit rate in percentage and the hit/miss volume (ops/s).

django-overview

Django Requests Overview Dashboard

The Django requests overview focuses on providing an overview of just requests to your system. The following things are core for the dashboard:

Filters - Allows us to filter by view and method, which are applied to the majority of panels.
Summary - An overview of the request volume (req/s), success rate (% of non 4-5xx responses), a p95 latency graph and lastly a p95 request body size.
API views & others - A graph of response statuses by status and view, and a table of request latency by view of non admin views.
Admin views - A graph of response statuses by status and view, and a table of request latency by view of admin views.
Weekly breakdowns - Weekly breakdowns for top templates, top exceptions by type, top exceptions by view and top responses by view.

django-requests-overview

Django Requests by View Dashboard

The Django requests by view focuses on providing a breakdown of specific views and visualizing the more expensive metrics such as request latency. The following things are core for the dashboard:

Filters - Allows us to filter by view and method, which are applied to the majority of panels.
Summary - An overview of the request volume (req/s), success rate (% of non 4-5xx responses), a p95 latency graph and lastly a p95 request body size.
Requests & responses - A graph of requests per second and a graph of response statuses by status summarized grouped by the status codes 200-299, 300-399, 400-499 and 500-599.
Latency & Status Codes - A graph of response statuses by each individual status and method and a graph of request latency by the buckets p50, p95, p99 and p99.9.

django-requests-by-view

Alerts

Alerts are trickier to get right for a generic use case, however they are still provided by the Django-mixin. They are also configurable with the config.libsonnet package in the repository, if you are familiar with Jsonnet then customizing the alerts should be fairly straight forward. The alerts can be found on GitHub and I’ll add a description for the alerts below.

Alert name: DjangoMigrationsUnapplied

Alerts when Django has unapplied migrations for longer than 15 minutes, indicating that a new rollout was finished, but migrations have not been run.

Alert name: DjangoDatabaseExceptions

Alerts when Django has hit database exceptions in the last 10 minutes.

Alert name: DjangoHighHttp4xxErrorRate

Alerts when more than 5% HTTP requests with status 4xx for a specific view in the past 5 minutes.

Alert name: DjangoHighHttp5xxErrorRate

Alerts when more than 5% HTTP requests with status 5xx for a specific view in the past 5 minutes.

Adjust these and add any new ones that you require!

Summary

Django-prometheus is a great library, and Grafana and Prometheus are amazing open source tools for monitoring purposes. The dashboard and alerts presented in this blog post should be easy to reuse and extend if needed. I think they set a good basis for monitoring, but they can be improved and adjusted, therefore it would be great if you have any suggestions that you open issues in the Django-mixin GitHub repository. Looking for any input to hopefully standardize dashboards and alerts for Django over time!