opencost-overview

Kubernetes Cost Tracking Simplified with OpenCost, Prometheus, and Grafana

2 months ago
4 min read

OpenCost is an open-source tool designed to help you monitor and understand the cost of your cloud infrastructure. As a project under the Cloud Native Computing Foundation (CNCF), OpenCost offers a transparent and powerful solution for cloud cost management. It provides both a user-friendly interface for visualizing cloud costs and Prometheus metrics, enabling you to query and visualize these costs using Grafana. The popular tool KubeCost is built on top of OpenCost, offering an enhanced feature set and user experience. However, KubeCost is not open-source, and its free plan has limitations on data retention and storage. Given these constraints and a preference for consolidating data visualization within Grafana, I opted to use OpenCost.

This blog post will introduce the opencost-mixin - a set of Prometheus rules and Grafana dashboards for OpenCost. The dashboards will provide insights on both an overview of cluster cost but also a breakdown of cost by namespace/node/pod/container. In addition to cost visualization, the opencost-mixin includes alerts for budget increases and helps identify anomalies. For example, you can set targeted budget alerts to monitor when your costs approach predefined thresholds or detect anomalies, such as a sudden 20% increase in cluster expenses. There are already two dashboards that are published in Grafana:

  • OpenCost Overview - A overview of the Kubernetes cluster cost with a breakdown by instance type, resource type (RAM/CPU/Persistent Volume) and namespace.
  • OpenCost Namespace - Provides insights to namespace costs with a breakdown by pods/containers/persistent volumes for that namespace.

Prometheus alerts for the opencost-mixin are available in GitHub. These cover cost anomalies (e.g., sudden spikes) and budget alerts (e.g., exceeding thresholds) and can be easily imported into your setup for proactive cost monitoring.

If you want to go directly to the dashboards you can use the links above, the rest of the blog post will describe setting up OpenCost and the various alerts and dashboards.

Installing OpenCost

First, add the OpenCost Helm chart library

helm repo add opencost https://opencost.github.io/opencost-helm-chart

The following Helm values are set:

metrics:
  serviceMonitor:
    enabled: true
prometheus:
  internal:
    enabled: true
    namespaceName: monitoring
    port: 9090
    serviceName: prometheus-k8s

We use the Prometheus-operator and have a Prometheus instance running in the monitoring namespace. We enable the ServiceMonitor and set internal.enabled to true to let OpenCost know that we have an internal Prometheus instance running and that we do not need a Prometheus instance deployed with the OpenCost chart.

Install OpenCost with the following command:

helm install opencost opencost/opencost -f values.yaml

Now you should be able to go to your Prometheus instance and query cost metrics - for example node_total_hourly_cost which provides total hourly costs for a node.

Grafana Dashboards

OpenCost Overview Dashboard

The OpenCost overview dashboard focuses on providing an overview of your Kubernetes cluster. The following things are core for the dashboard:

  • Cluster Summary - Provides a section that summarizes the costs of the whole cluster. It shows pie chart panels that group the costs by resource/namespace/instance type. It also shows cost variance and also cost variance for each resource - for example the increase/decrease in cost for CPU over time.
  • Cloud Resources - It visualizes the instances deployed in the cloud and the costs associated with them. It also shows the persistent volumes and the costs associated with them.
  • Namespace - It provides a breakdown of costs by namespace. There's also direct links to the namespace dashboard that provides a more detailed view of the namespace.

Opencost-overview

OpenCost Namespace Dashboard

The OpenCost namespace dashboard provides a more detailed view of the costs for a specific namespace. The dashboard is split into the following sections:

  • Filters - Allows us to filter by namespace, which is applied to all the panels.
  • Summary - An overview of the namespace cost - hourly, monthly, daily costs as well as costs grouped by resource (CPU/RAM/PV).
  • Pod - A summary of the 10 most expensive pods, including their current costs and a comparison of cost changes over the last 7 and 30 days.
  • Container - A summary of the 10 most expensive containers, including their current costs and a comparison of cost changes over the last 7 and 30 days.
  • Persistent Volumes - An overview of which persistent volumes are deployed into the namespace and the cost of each persistent volume.

Opencost-namespace

Alerts

The alerts can be customized using the config.libsonnet file available in the repository. If you're familiar with Jsonnet, modifying and tailoring the alerts to suit your specific requirements should be straightforward. It is needed to adjust the alerts according to your Kubernetes cluster. The alerts can be found on GitHub, and I'll add a description for the alerts below.

  • Alert name: OpenCostMonthlyBudgetExceeded

Alerts when the predicted monthly budget (current hourly cost multiplied with 730 hours) exceeds the threshold, currently the threshold is set to $200 as an example - you need to configure this.

  • Alert name: OpenCostAnomalyDetected

Alerts when the average hourly cost over the 3 hours exceeds the 7-day average by more than 20%. The threshold of 20% can be adjusted.

Summary

OpenCost is a great tool for understanding your cloud costs, and the opencost-mixin provides a set of Prometheus rules and Grafana dashboards that can help you visualize and monitor your costs. The dashboards provide an overview of the cluster costs, as well as a breakdown by namespace, pod, container, and persistent volume. The alerts can help you set budget alerts and detect cost anomalies. The dashboards and alerts are work in progress, so feel free to share feedback in the opencost-mixin repository of what you would like to see or any issues you experience.


Similar Posts

Django Monitoring with Prometheus and Grafana

6 min read

The Prometheus package for Django provides a great Prometheus integration, but the open source dashboards and alerts that exist are not that great. The to-go Grafana dashboard does not use a large portion of metrics provided by the Django-Prometheus package, …


Celery Monitoring with Prometheus and Grafana

5 min read

Celery is a python project used for asynchronous job processing and task scheduling in web applications or distributed systems. It is very commonly used together with Django, Celery as the asynchronous job processor and Django as the web framework. Celery …


Showcase: Using Jsonnet & Mixins to Simplify Endpoint Monitoring with Blackbox-exporter

4 min read

Blackbox-exporter is a Prometheus exporter that probes endpoints and exposes metrics of the probe result. There are multiple guides on how to use the Blackbox-exporter, and we won't go into that, but rather focus on newer things as Jsonnet as …