best-practises-for-a-performant-django-admin

Best Practises for A Performant Django Admin

5 months ago 2196 views
10 min read

The admin interface that comes with Django is one of the great things about Django. It comes with a ton of features out of the box and has many open source packages that extend the base functionality even more. Well documented and works very well, the only pain point I've found when using the admin and its features is when I've had large tables containing millions of objects. In that case, searching, sorting, ordering, counting and other features cause the admin to load slowly and creates database pressure. At that point we need to optimize the admin and there are many small changes that you can do that will speed up your admin load times and reduce any additional database load. This blog post will describe approaches to the common performance problems I've experienced when having a large database.

Define Which Fields Are Sortable

By default, the Django admin allows any fields to be sorted by a click. This can produce unwanted results with a large dataset, where ordering on a non-indexed field might send a heavy query to the database, causing database pressure and the request timing out. If many admin users use sorting simultaneously it can cause cascading events, lowering the web servers and database performance. To disable them, you can define default fields sortable fields as below:

class BaseAdmin(ModelAdmin):
    sortable_by: tuple = ("created", "modified")

This is defined in a BaseAdmin class and then configurable for each admin for a model.

Remove Ordering or Ensuring Ordering on Indexed fields

Ordering can be heavy on the database, especially with large datasets. By default, Django Admin uses the querysets default ordering, this can also be overwritten in admin using the ordering field. You can define it and use database fields that are indexed as in the following solution.

class BaseAdmin(ModelAdmin):
    ordering: tuple = ("created", "modified")

Or you can remove ordering completely by removing the ordering field but also ensure that the Model default queryset does not have any ordering.

class User(Model):

    class Meta:
        ordering = ["created", "modified"] # Remove this

Be mindful on how many fields you order by and if the fields you order by are indexed.

Reduce List of Objects per Page

By default, Django lists 100 objects per page in the changelist view. We can reduce this to a lower number to speed up load times.

# https://docs.djangoproject.com/en/4.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.list_per_page
LIST_PER_PAGE = 20

Large Table Paginator (Count Estimation)

By default, Django has exact count estimation which on small tables works fine, however performing a count on 80 million objects takes time and slows down the admin. To resolve this, Postgres supports reltuples which gives a count estimation rather than a precise object count that takes time to compute. The obvious drawback is that the object count is not exact, but does that matter? If there's let's say 4.4 million objects in the table. Do you care if it says an estimate of 4400000 rather than the actual value of 4400112? If so, you should not use the estimate count. Count estimation should also not be used on smaller tables. To implement this, there's a GitHub gist that I've forked. Below is an example.

class LargeTablePaginator(Paginator):
    """
    # https://gist.github.com/noviluni/d86adfa24843c7b8ed10c183a9df2afe
    Overrides the count method of QuerySet objects to avoid timeouts.
    - Get an estimate instead of actual count when not filtered (this estimate can be stale and hence not fit for
    situations where the count of objects actually matter).
    """

    @cached_property
    def count(self):
        """
        Returns an estimated number of objects, across all pages.
        """
        if not self.object_list.query.where:  # type: ignore
            try:
                with connection.cursor() as cursor:
                    # Obtain estimated values (only valid with PostgreSQL)
                    cursor.execute(
                        "SELECT reltuples FROM pg_class WHERE relname = %s",
                        [self.object_list.query.model._meta.db_table],  # type: ignore
                    )
                    estimate = int(cursor.fetchone()[0])
                    return estimate
            except Exception:  # pylint: disable=broad-except
                # If any other exception occurred fall back to default behaviour
                pass
        return super().count

Now we want to be able to dynamically use LargeTablePaginator within get_paginator function:

class BaseAdmin(ModelAdmin):
    large_table_paginator = False

    def get_paginator(  # pylint: disable=too-many-arguments
        self,
        request,
        queryset,
        per_page,
        orphans=0,
        allow_empty_first_page=True,
    ):
        # Always show count locally
        if self.large_table_paginator and not settings.DEBUG:
            return LargeTablePaginator(
                queryset, per_page, orphans, allow_empty_first_page
            )
        return self.paginator(queryset, per_page, orphans, allow_empty_first_page)

We disable the large table paginator in debug mode which is usually set in local development to show the full object count locally. Finally, you can override the large_table_paginator boolean per admin:

class BaseAdmin(ModelAdmin):
    large_table_paginator =  True

Disable Full Result Count

The full result count is shown when you do a search or filter, for example. Next to the search bar it will say 0 results (1 total), the 1 total fetches the object count for that table. As we've seen previously, that can be a heavy process on large tables. Therefore, Django offers the option to disable the count by setting the show_full_result_count property to False. Now, it will not display how many objects exists for that table, but rather offer an option to remove all filters and show all objects 0 results (Show all). Below is an example.

class BaseAdmin(ModelAdmin):
    show_full_result_count =  False

Remove Date Hierarchy Drilldowns

The Django date_hierarchy is a great feature for the admin, but it can get expensive for the database and cause long load times in the UI. Django will by default perform a query to see for which date spans objects exists. For example, if you had 10 users that joined January, March and August. Django will perform a query and select distinct users by month, if you were to go to a single month it will perform a query to select distinct users for each day, only presenting the month and days in the UI for when users joined. This sounds great and works fine for small datasets. But if you were to have 1 million user, then the drill down would group all of them by day to ensure that only the days that users joined on become options in the UI.

Filtering the objects and only presenting a list of days or months which are valid is the correct behavior, but not optimal for large datasets. We can bypass this by presenting all the dates regardless of whether objects exist for those dates, skipping the heavy database queries. There's a Django package called django-admin-lightweight-date-hierarchy and a blog post written by Haki Benita that explains the problem extensively and the solution.

The usage is straight forward. Install the package poetry add django-admin-lightweight-date-hierarchy and add it to installed packages.

INSTALLED_APPS = (
    ...
    'django_admin_lightweight_date_hierarchy',
    ...
)

Then set the date_hierarchy_drilldown flag to False to disable it.

class BaseAdmin(ModelAdmin):
    date_hierarchy = "created"
    date_hierarchy_drilldown = False # Disable drilldowns

You can also customize which dates should be available. By default, I chose to only show past dates (no objects are created in the future) and I only show the years starting from when the application had its first objects.

class BaseAdmin(ModelAdmin):

    def get_date_hierarchy_drilldown(self, year_lookup, month_lookup):
        """Drill-down only on past dates."""

        today = timezone.now().date()

        if year_lookup is None and month_lookup is None:
            # Applications first year in production
            apps_first_year = 2022
            return (
                datetime.date(y, 1, 1) for y in range(ums_first_year, today.year + 1)
            )

        if year_lookup is not None and month_lookup is None:
            # Past months of selected year.
            this_month = today.replace(day=1)
            return (
                month
                for month in (
                    datetime.date(int(year_lookup), month, 1) for month in range(1, 13)
                )
                if month <= this_month
            )

        if year_lookup is not None and month_lookup is not None:
            # Past days of selected month.
            days_in_month = calendar.monthrange(year_lookup, month_lookup)[1]
            return (
                day
                for day in (
                    datetime.date(year_lookup, month_lookup, i + 1)
                    for i in range(days_in_month)
                )
                if day <= today
            )

To understand the downside of offering all date spans an image is shown below.

date-hierarchy-preview

As you see in the image we only have 2 objects, but we present all the months, the package that we installed helps with this. Obviously, filtering 2 objects by date and then presenting only a single month as the drill down option is better. However, as mentioned earlier, this only works with small datasets. Therefore, enable and disable the drill down per model admin and table size.

Cache Model Properties

Defining custom model properties and then using the properties to display UI fields in the admin is a common practice. But using a property in multiple sections of the Admin will cause excessive db queries, and the properties can be cached instead using the @cached_property decorator. Let's take a look at how to use it.

class MyModel(Model):

    @cached_property
    def computational_heavy_query(self):
        """
        Run a heavy query
        """
        return (
            self.objects.filter(
                tag__in=['a', 'b', 'c'],
                name__icontains='test'
            )
            .order_by("-created", "-modified")
            .distinct("id")
            .all()
        )

Now in the admin we'll use this property in multiple fields.

class BaseAdmin(ModelAdmin):

    fields = ("computational_heavy_query_field_1", "computational_heavy_query_field_2")
    readonly_fields = ("computational_heavy_query_field_1", "computational_heavy_query_field_2")

    def computational_heavy_query_field_1(self, obj):
        computational_heavy_query = obj.computational_heavy_query
        return mark_safe(
            f"""
            <p>{computational_heavy_query} 1</p>
            """
        )

    def computational_heavy_query_field_2(self, obj):
        computational_heavy_query = obj.computational_heavy_query
        return mark_safe(
            f"""
            <p>{computational_heavy_query} 2</p>
            """
        )

Since the model property uses caching by adding the cached_property decorator, the computational heavy query will only be executed once even if the property is used twice in the admin. The cached property function will be executed once per instance of the object and persist as long as the instance exists. If we were not using the cached_property decorator, the query would be executed twice. A great way to speed up heavy queries on the model level, which then will speed up the admin!

Search

Django admin has built-in support for search. It works using the search_fields variable and by default uses icontains for the words searched. It works fairly well but becomes slow on large datasets, which is usually the case with searches. We'll take a look at how to optimize the search.

Minimize Number of Search Fields

The more fields you add to the search_fields the more icontains queries will be done against those fields. Only add the search_fields that will be extensively used, adding any more will provides nice functionality but will slow down the search.

Search Field Lookups

By default, Django will use icontains in the database query against the fields defined in search_fields, it's the best generic option but not optimal for a variety of fields. For example for UUIDs, numbers, static strings you might want to use exact which will be a much less expensive query than the default icontains. Here's an example:

class BaseAdmin(ModelAdmin):

    search_fields = ("id__exact", "company_id__exact", "name")

Below is the full list of lookups that you could use as you think is best and here's a description of the lookups.

list_of_lookups = ['exact', 'iexact', 'gt', 'gte', 'lt', 'lte', 'in', 'contains', 'icontains', 'startswith', 'istartswith', 'endswith', 'iendswith', 'range', 'isnull', 'regex', 'iregex', 'contained_by']

DjangoQL

The out-of-the-box search integration works fine but becomes trickier to use with complex queries. For example, if you'd like to search for an object with multiple fields that are True, then that would be impossible or very tricky to do with default admin. The DjangoQL package provides admin integration that turns the admin into an advanced search box with autocomplete and an advanced search language. The package can be found on GitHub. Adding the package will allow you to do queries as new = True and date_published ~ "2023-09" to find any let's say blog post that has the field new set to True and is published in the 9th month of 2023. It is simple to install and integrate with the Django admin. Install it using poetry add djangoql and add it to installed apps.

INSTALLED_APPS = [
    ...
    'djangoql',
    ...
]

And then add the mixin to the admin:

from djangoql.admin import DjangoQLSearchMixin

class BaseAdmin(DjangoQLSearchMixin, ModelAdmin):
    ...

There's many other tweaks and customizations that you can do with the DjangoQL package, that's described in depth on the package's GitHub page.

Prefetch and Selecting Related Queryset

In the Django admin we can display related objects using inlines or underscores to access related fields as for example company__name. But, doing so might create inefficient queries when fetching related objects. Django provides two options, using select_related and prefetch_related for fetching related objects.

There's a wide array of resources on prefetching and selecting related. I'll provide a brief summary below and a usage example.

Select Related

Selecting related fields will join the related tables and fetch all the fields required both from the main table and the related table in one go. It makes sense to use select_related in cases where we fetch a list of objects and want to at the same time fetch the related objects for each objects in that list. There are two approaches, the first is to use list_select_related that will tell Django to use select_related but only on the list page.

class MyAdmin(BaseAdmin):
    list_select_related = ('company',)

The other approach is to override the get_queryset function to select related objects that will work not only on the list page but on all admin pages for that model.

class MyAdmin(BaseAdmin):

    def get_queryset(self, request):
        return super().get_queryset(request).select_related("company")

Prefetch Related

Prefetch related works differently and works for M2M relations, whereas select_related does not. Prefetching related objects works by doing a join in Python rather than in SQL. It fetches first all the objects that you have filtered of the primary table and then pass all those IDs to the M2M related table, only fetching those IDs that exist in relation to the objects from the primary table. Two queries are performed in total, and then a join can be performed in Python between the two lists of objects.

class MyAdmin(BaseAdmin):

    def get_queryset(self, request):
        return super().get_queryset(request).prefetch_related(
            'friends'
        )

Summary

The blog posts walks through multiple optimization approaches to make sorting, ordering, searching, counting and other operations faster. It might not be initially needed, but as projects grow most UI visualizations need optimization in terms of how we fetch and filter objects. These come with drawbacks in some cases, but I do not think it affects functionality too much. Hopefully, these are easy wins for Django admin performance in your project, and feel free to share any other generic optimization solutions!


Similar Posts

4 months ago
django ui

Custom Django Error Pages

3 min read

Django comes with default views for 400, 403, 404 and 500 pages which is great. However, the default templates for these views are minimal - they just indicate what error it is. This blog post will walk through how to …


4 years ago
mailgun statuscake terraform cloudflare devops s3 rds django

Kickstarting Infrastructure for Django Applications with Terraform

8 min read

When creating Django applications or using cookiecutters as Django Cookiecutter you will have by default a number of dependencies that will be needed to be created as a S3 bucket, a Postgres Database and a Mailgun domain.


4 years ago
django cms wagtail headless api

Recipes when building a headless CMS with Wagtail's API

3 min read

Recently I built a headless CMS using Wagtail's API as a backend with NextJS/React/Redux as a frontend. Building the API I ran into some small issues with Image URL data, the API representation of snippets and creating a fully customized …