grafana-dashboard-1

NestJS Apollo GraphQL Prometheus Metrics and Grafana Dashboards

2 years ago 7086 views
4 min read

Apollo GraphQL and NestJS are gaining traction quickly, however the monitoring approaches are unclear. At the moment (late 2021 / early 2022) there are no default exporters or libraries for Prometheus metrics and the same goes for Grafana dashboards, this blog post will provide both. Just to ensure that you are aware - Apollo Studio provides metrics and many other features for your graphs. The only downside is you'll most likely end up with a paid plan and you will be locked-in to their offering. Also, there is no way of exporting metrics to your Prometheus instance and centralizing alerting & dashboards.

This blog post will be based on a NestJS implementation for the dependency injection of Prometheus metrics, however it should work similarly in other setups.

Creating your Prometheus metrics

We will use three OSS repositories to create our metrics:

  • prom-client: The default NodeJS Prometheus library.
  • @willsoto/nestjs-prometheus: NestJS Prometheus integration for injecting metrics.
  • apollo-metrics: Prometheus counters/histograms for each stage of the Apollo GraphQL request lifecycle.

The below sample metrics are extracted from apollo-metrics, you will head into that repository and grab all of metrics and define them.

export const parsedCounter = makeCounterProvider({
  name: 'graphql_queries_parsed',
  help: 'The amount of GraphQL queries that have been parsed.',
  labelNames: ['operation_name', 'operation'],
});

export const validationStartedCounter = makeCounterProvider({
  name: 'graphql_queries_validation_started',
  help: 'The amount of GraphQL queries that have started validation.',
  labelNames: ['operation_name', 'operation'],
});

export const resolvedCounter = makeCounterProvider({
  name: 'graphql_queries_resolved',
  help: 'The amount of GraphQL queries that have had their operation resolved.',
  labelNames: ['operation_name', 'operation'],
});

export const executionStartedCounter = makeCounterProvider({
  name: 'graphql_queries_execution_started',
  help: 'The amount of GraphQL queries that have started executing.',
  labelNames: ['operation_name', 'operation'],
});

The metrics above are aligned with the Apollo GraphQL request cycle. Each request has a 9 step lifecycle and the metrics defined are aligned against those. Apollo has a in-depth guide for each step and what it means, you can find it here. The Prometheus metric help section should also provide enough information to understand each request lifecycle step.

NestJS Dependency Injection

Now we'll create a NestJS plugin called GraphQLPrometheusMetricsPlugin, we'll use the above metrics and create a class that extends the ApolloServerPlugin to create a server plugin. The server plugin will later on be used in the Apollo server and will be incrementing metrics for each request.

@Injectable()
@Plugin()
export class GraphQLPrometheusMetricsPlugin implements ApolloServerPlugin {
  constructor(
    @InjectMetric('graphql_queries_parsed')
    public parsedCounter: Counter<string>,
    @InjectMetric('graphql_queries_validation_started')
    public validationStartedCounter: Counter<string>,
    @InjectMetric('graphql_queries_resolved')
    public resolvedCounter: Counter<string>,
    @InjectMetric('graphql_queries_execution_started')
    public executionStartedCounter: Counter<string>,
    @InjectMetric('graphql_queries_errors')
    public errorsCounter: Counter<string>,
    @InjectMetric('graphql_queries_responded')
    public respondedCounter: Counter<string>,
  ) {}

  async requestDidStart(): Promise<GraphQLRequestListener<any>> {
    const parsedCounter = this.parsedCounter;
    const validationStartedCounter = this.validationStartedCounter;
    const resolvedCounter = this.resolvedCounter;
    const executionStartedCounter = this.executionStartedCounter;
    const errorsCounter = this.errorsCounter;
    const respondedCounter = this.respondedCounter;
    const resolverTimeCounter = this.resolverTimeHistogram;
    const totalRequestTimeCounter = this.totalRequestTimeHistogram;
    return {
      parsingDidStart(parsingContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: parsingContext.request.operationName || '',
          operation: parsingContext.operation?.operation,
        });
        parsedCounter.inc(labels);
        return;
      },
      validationDidStart(validationContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: validationContext.request.operationName || '',
          operation: validationContext.operation?.operation,
        });
        validationStartedCounter.inc(labels);
        return;
      },
      didResolveOperation(resolveContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: resolveContext.request.operationName || '',
          operation: resolveContext.operation.operation,
        });
        resolvedCounter.inc(labels);
        return;
      },
      executionDidStart(executingContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: executingContext.request.operationName || '',
          operation: executingContext.operation.operation,
        });
        executionStartedCounter.inc(labels);
        return;
      },
      didEncounterErrors(errorContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: errorContext.request.operationName || '',
          operation: errorContext.operation?.operation,
        });
        errorsCounter.inc(labels);
        return;
      },
      willSendResponse(responseContext): Promise<void> {
        const labels = filterUndefined({
          operation_name: responseContext.request.operationName || '',
          operation: responseContext.operation?.operation,
        });
        respondedCounter.inc(labels);

As you see above each step in the request lifecycle increments the equivalent metric for that step. Each metric has two labels by default:

  • operation_name - indicates the name of the GraphQL operation which is not required but it is helpful for debugging and logging.
  • operation - indicates the GraphQL operation, i.e mutation, query or subscription.

Both of the labels are highly useful for the metrics, the operation comes by default but the operation name needs to be added by the user for each request to your API. Apollo covers both very well in their documentation. The labels will also be used in the Grafana dashboards.

Now we can create our NestJS module using the metrics and the plugin:

import { Module } from '@nestjs/common';
import { PrometheusModule } from '@willsoto/nestjs-prometheus';

import {
  GraphQLPrometheusMetricsPlugin,
  validationStartedCounter,
  parsedCounter,
  resolvedCounter,
  executionStartedCounter,
  errorsCounter,
  respondedCounter,
} from './prometheus.plugin';

@Module({
  imports: [PrometheusModule.register()],
  providers: [
    GraphQLPrometheusMetricsPlugin,
    validationStartedCounter,
    parsedCounter,
    resolvedCounter,
    executionStartedCounter,
    errorsCounter,
    respondedCounter,
  ],
  exports: [GraphQLPrometheusMetricsPlugin],
})
export class PromModule {}

And lastly add it to our application:

import { GraphQLPrometheusMetricsPlugin } from './metrics/prometheus.plugin';
...

@Module({
  imports: [
    ...
    PromModule,
    GraphQLGatewayModule.forRootAsync({
      imports: [
        ...
        PromModule,
      ],
      useFactory: async (
        ...
        graphQLPrometheusMetrics: GraphQLPrometheusMetricsPlugin,
          => {
        return {
          server: {
            ...
            plugins: [graphQLPrometheusMetrics],

Now we should have Prometheus metrics at the /metrics endpoint and you should be able to scrape the endpoint with Prometheus. You can try querying the Prometheus instance with the query graphql_queries_execution_started and you should see results as:

graphql_queries_execution_started{container="gateway", endpoint="gateway-http", instance="redacted", job="gateway", namespace="redacted", operation="mutation", operation_name="redacted", pod="redacted", service="gateway"} 10231

Grafana Dashboard

By now we should have our metrics in our Prometheus instance and we should be able to query it and create a dashboard. I've created a sample dashboard that covers requests and errors summed by the operation and the operation name.

Grafana Dashboard

The dashboard can be found here.

The dashboard should cover the basics, feel free to share your dashboard if you've created a better one!


Similar Posts

CI/CD for Apollo GraphQL Managed Federation

7 min read

GraphQL federation is great to use when you want a single API/gateway for all your queries. The simple to-go approach is schema stitching, where you run a gateway microservice which targets all other microservices and composes a graph. This works …


RabbitMQ Per Queue Monitoring

4 min read

RabbitMQ has a native built-in Prometheus plugin and by default it has granular metrics disabled. Granular metrics means per-queue/vhost metrics - detailed metrics that provide message lag and consumer info on a queue and vhost basis. You could enable granular …


Django Monitoring with Prometheus and Grafana

6 min read

The Prometheus package for Django provides a great Prometheus integration, but the open source dashboards and alerts that exist are not that great. The to-go Grafana dashboard does not use a large portion of metrics provided by the Django-Prometheus package, …