grafana-dashboard-1

NestJS Apollo GraphQL Prometheus Metrics and Grafana Dashboards

Published on April 30, 2022, 00:00 UTC 4 minutes 8051 views

Apollo GraphQL and NestJS are gaining traction quickly, however the monitoring approaches are unclear. At the moment (late 2021 / early 2022) there are no default exporters or libraries for Prometheus metrics and the same goes for Grafana dashboards, this blog post will provide both. Just to ensure that you are aware - Apollo Studio provides metrics and many other features for your graphs. The only downside is you’ll most likely end up with a paid plan and you will be locked-in to their offering. Also, there is no way of exporting metrics to your Prometheus instance and centralizing alerting & dashboards.

This blog post will be based on a NestJS implementation for the dependency injection of Prometheus metrics, however it should work similarly in other setups.

Creating your Prometheus metrics

We will use three OSS repositories to create our metrics:

  • prom-client: The default NodeJS Prometheus library.
  • @willsoto/nestjs-prometheus: NestJS Prometheus integration for injecting metrics.
  • apollo-metrics: Prometheus counters/histograms for each stage of the Apollo GraphQL request lifecycle.

The below sample metrics are extracted from apollo-metrics, you will head into that repository and grab all of metrics and define them.

export const parsedCounter = makeCounterProvider({
  name: 'graphql_queries_parsed',
  help: 'The amount of GraphQL queries that have been parsed.',
  labelNames: ['operation_name', 'operation'],
});

export const validationStartedCounter = makeCounterProvider({
  name: 'graphql_queries_validation_started',
  help: 'The amount of GraphQL queries that have started validation.',
  labelNames: ['operation_name', 'operation'],
});

export const resolvedCounter = makeCounterProvider({
  name: 'graphql_queries_resolved',
  help: 'The amount of GraphQL queries that have had their operation resolved.',
  labelNames: ['operation_name', 'operation'],
});

export const executionStartedCounter = makeCounterProvider({
  name: 'graphql_queries_execution_started',
  help: 'The amount of GraphQL queries that have started executing.',
  labelNames: ['operation_name', 'operation'],
});

The metrics above are aligned with the Apollo GraphQL request cycle. Each request has a 9 step lifecycle and the metrics defined are aligned against those. Apollo has a in-depth guide for each step and what it means, you can find it here. The Prometheus metric help section should also provide enough information to understand each request lifecycle step.

NestJS Dependency Injection

Now we’ll create a NestJS plugin called GraphQLPrometheusMetricsPlugin, we’ll use the above metrics and create a class that extends the ApolloServerPlugin to create a server plugin. The server plugin will later on be used in the Apollo server and will be incrementing metrics for each request.

@Injectable()
@Plugin()
export class GraphQLPrometheusMetricsPlugin implements ApolloServerPlugin {
  constructor(
    @InjectMetric('graphql_queries_parsed')
    public parsedCounter: Counter<string>,
    @InjectMetric('graphql_queries_validation_started')
    public validationStartedCounter: Counter<string>,
    @InjectMetric('graphql_queries_resolved')
    public resolvedCounter: Counter<string>,
    @InjectMetric('graphql_queries_execution_started')
    public executionStartedCounter: Counter<string>,
    @InjectMetric('graphql_queries_errors')
    public errorsCounter: Counter<string>,
    @InjectMetric('graphql_queries_responded')
    public respondedCounter: Counter<string>,
  ) {}

  async requestDidStart(): Promise<GraphQLRequestListener<any>> {
    const parsedCounter = this.parsedCounter;
    const validationStartedCounter = this.validationStartedCounter;
    const resolvedCounter = this.resolvedCounter;
    const executionStartedCounter = this.executionStartedCounter;
    const errorsCounter = this.errorsCounter;
    const respondedCounter = this.respondedCounter;
    const resolverTimeCounter = this.resolverTimeHistogram;
    const totalRequestTimeCounter = this.totalRequestTimeHistogram;
    return {
      parsingDidStart(parsingContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: parsingContext.request.operationName || '',
          operation: parsingContext.operation?.operation,
        });
        parsedCounter.inc(labels);
        return;
      },
      validationDidStart(validationContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: validationContext.request.operationName || '',
          operation: validationContext.operation?.operation,
        });
        validationStartedCounter.inc(labels);
        return;
      },
      didResolveOperation(resolveContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: resolveContext.request.operationName || '',
          operation: resolveContext.operation.operation,
        });
        resolvedCounter.inc(labels);
        return;
      },
      executionDidStart(executingContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: executingContext.request.operationName || '',
          operation: executingContext.operation.operation,
        });
        executionStartedCounter.inc(labels);
        return;
      },
      didEncounterErrors(errorContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: errorContext.request.operationName || '',
          operation: errorContext.operation?.operation,
        });
        errorsCounter.inc(labels);
        return;
      },
      willSendResponse(responseContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: responseContext.request.operationName || '',
          operation: responseContext.operation?.operation,
        });
        respondedCounter.inc(labels);

As you see above each step in the request lifecycle increments the equivalent metric for that step. Each metric has two labels by default:

  • operation_name - indicates the name of the GraphQL operation which is not required but it is helpful for debugging and logging.
  • operation - indicates the GraphQL operation, i.e mutation, query or subscription.

Both of the labels are highly useful for the metrics, the operation comes by default but the operation name needs to be added by the user for each request to your API. Apollo covers both very well in their documentation. The labels will also be used in the Grafana dashboards.

Now we can create our NestJS module using the metrics and the plugin:

import { Module } from '@nestjs/common';
import { PrometheusModule } from '@willsoto/nestjs-prometheus';

import {
  GraphQLPrometheusMetricsPlugin,
  validationStartedCounter,
  parsedCounter,
  resolvedCounter,
  executionStartedCounter,
  errorsCounter,
  respondedCounter,
} from './prometheus.plugin';

@Module({
  imports: [PrometheusModule.register()],
  providers: [
    GraphQLPrometheusMetricsPlugin,
    validationStartedCounter,
    parsedCounter,
    resolvedCounter,
    executionStartedCounter,
    errorsCounter,
    respondedCounter,
  ],
  exports: [GraphQLPrometheusMetricsPlugin],
})
export class PromModule {}

And lastly add it to our application:

import { GraphQLPrometheusMetricsPlugin } from './metrics/prometheus.plugin';
...

@Module({
  imports: [
    ...
    PromModule,
    GraphQLGatewayModule.forRootAsync({
      imports: [
        ...
        PromModule,
      ],
      useFactory: async (
        ...
        graphQLPrometheusMetrics: GraphQLPrometheusMetricsPlugin,
          => {
        return {
          server: {
            ...
            plugins: [graphQLPrometheusMetrics],

Now we should have Prometheus metrics at the /metrics endpoint and you should be able to scrape the endpoint with Prometheus. You can try querying the Prometheus instance with the query graphql_queries_execution_started and you should see results as:

graphql_queries_execution_started{container="gateway", endpoint="gateway-http", instance="redacted", job="gateway", namespace="redacted", operation="mutation", operation_name="redacted", pod="redacted", service="gateway"} 10231

Grafana Dashboard

By now we should have our metrics in our Prometheus instance and we should be able to query it and create a dashboard. I’ve created a sample dashboard that covers requests and errors summed by the operation and the operation name.

Grafana Dashboard

The dashboard can be found here.

The dashboard should cover the basics, feel free to share your dashboard if you’ve created a better one!

Related Posts

May 02, 2022 7 minutes

CI/CD for Apollo GraphQL Managed Federation

GraphQL federation is great to use when you want a single API/gateway for all your queries. The simple to-go approach is schema stitching, where you run a gateway microservice which targets all other microservices and composes a graph. This works initially fine, however over time you’d like schema checking, auto-polling for graph updates, seamless rollouts(no issues with schema stitching when rolling out) and overall a process that’s well integrated into your continuous integration and continuous delivery pipeline. The basic approach of schema stiching does not provide this, using managed federation provided by Apollo Studio improves the workflow and solves many of the pain points.

RabbitMQ Per Queue Monitoring

RabbitMQ has a native built-in Prometheus plugin and by default it has granular metrics disabled. Granular metrics means per-queue/vhost metrics - detailed metrics that provide message lag and consumer info on a queue and vhost basis. You could enable granular per-object metrics but this is not recommended as the plugin becomes much slower on a large cluster and the label cardinality for your time series database could become high.

To solve this you could use the unofficial OSS RabbitMQ exporter written by kbudde that will allow you to have granular metrics enabled and also disable specific metrics that the native Prometheus plugin provides. The unofficial exporter refers to a mixed approach where you use the unofficial exporter for detailed metrics and disable all other metrics and use the native RabbitMQ Prometheus plugin for all other metrics.

Django Monitoring with Prometheus and Grafana

The Prometheus package for Django provides a great Prometheus integration, but the open source dashboards and alerts that exist are not that great. The to-go Grafana dashboard does not use a large portion of metrics provided by the Django-Prometheus package, alongside this there are no filters for views, methods, jobs and namespaces. This blog post will introduce the Django-mixin - a set of Prometheus rules and Grafana dashboards for Django. The dashboard and alerts will provide insights on applied/unapplied migrations, RED (requests per second, error percentage of the request, latency for each request) metrics, database ops and cache hit rate.

Shynet