Kubernetes, Docker, and tailnet services often end up split across ingress objects, container labels, private DNS names, Grafana dashboards, runbooks, and a self-hosted homepage that has to be updated by hand. Compass turns those sources into a searchable service dashboard by discovering Kubernetes HTTPRoute, GRPCRoute, and Ingress resources, Docker containers, Tailscale devices and services, Headscale nodes, static YAML, and JSON APIs.
Blog Posts
Most Popular Blog Tags
Kubernetes, Docker, and Tailscale service discovery dashboard with Compass
Karpenter Monitoring: Spot Savings and Node Pool Cost Breakdown
Karpenter now exposes enough pricing data to estimate Kubernetes node costs directly from Prometheus metrics. The existing Karpenter dashboards in the kubernetes-autoscaling-mixin already cover node pools, instance types, and scaling behavior. This post focuses on the new cost dashboard: estimated monthly cost, spot instance savings, and node pool cost breakdown in Grafana.
This builds on the earlier posts on Karpenter monitoring with Prometheus and Grafana, comprehensive Kubernetes autoscaling monitoring, and Kubernetes cost tracking with OpenCost. If you want actual cost allocation, shared resource accounting, and historical cost analysis, use OpenCost and the opencost-mixin. This Karpenter dashboard answers a narrower question: what does the current Karpenter-managed node fleet look like in dollars?
Argo Workflows monitoring with Prometheus and Grafana
Argo Workflows exposes enough metrics to see whether workflows are backing up, CronWorkflows are firing, and the controller is keeping up, but raw /metrics output does not make any of that easy to read. This post covers argo-workflows-mixin, a Prometheus and Grafana mixin that adds two dashboards and a focused alert set for Argo Workflows.
The mixin is available on GitHub. It currently ships with two Grafana dashboards and four alert rules:
Monitoring Go runtime with Prometheus and Grafana
Go applications expose a useful set of runtime metrics, but raw /metrics output does not make it easy to spot GC pressure, scheduler latency, memory growth, or file descriptor exhaustion. This post covers a go-mixin for Prometheus and Grafana that adds a dashboard and alerts for the Go runtime.
The mixin is available on GitHub. The dashboard is also published in the Grafana dashboard library. It currently ships with one Grafana dashboard and three alert rules:
Go / Overview- A dashboard for runtime CPU usage, scheduler latency, garbage collection, heap churn, mutex contention, cgo activity, and file descriptor pressure.GoHighGcCpu- Alerts when a Go process spends too much CPU time in garbage collection.GoHighSchedulerLatency- Alerts when runnable goroutines wait too long to be scheduled.GoHighFdUsage- Alerts when a Go process is close to its file descriptor limit.
The repo also includes generated dashboard JSON and Prometheus rule files, so you can either vendor the mixin into your Jsonnet setup or import the generated files directly.
Observability for Headscale: Metrics and Dashboards in Grafana
Headscale is an open source, self-hosted control server compatible with the Tailscale clients. It lets you run your own Tailnet and have full control over users, nodes, keys, and routing policies without relying on Tailscale’s hosted control plane. This post introduces the tailscale-exporter and shows how to collect Headscale metrics via the Headscale gRPC API, and visualize everything in Grafana using dashboards and alerts bundled in the mixin.
Monitoring Envoy and Envoy Gateway with Prometheus and Grafana
Envoy is a popular open source edge and service proxy that's widely used in modern cloud-native architectures. Envoy gateway is a controller that manages Envoy proxies in a Kubernetes environment. Monitoring Envoy and Envoy gateway is crucial for ensuring the reliability and performance of your applications. In this blog post, we'll explore how to monitor Envoy and Envoy gateway using Prometheus and Grafana and we'll also introduce a new monitoring-mixin for Envoy.
With the retirement of ingress-nginx, many users are looking for alternatives for ingress controllers. Envoy gateway is a great option for those who want to leverage the power of Envoy in their Kubernetes clusters. I recently migrated from ingress-nginx and you can read more about it here.
Replacing Ingress-NGINX with Envoy Gateway in My Personal Cluster
With the retirement of ingress-nginx, many users are looking for alternatives for ingress controllers. Envoy gateway looked like a promising option, so I decided to give it a try in my personal Kubernetes cluster. I'll describe my experience deploying Envoy Gateway and how I was able to replicate my previous ingress-nginx setup. This blog post covers my personal cluster and the migration would have been harder in a production environment with more complex requirements.
Visualizing your Tailnet in Grafana
Tailscale is a popular VPN solution that allows you to create secure, encrypted connections between devices. It is based on the WireGuard protocol and is designed to be easy to use and configure. Recently, I've started using Tailscale more extensively both in my personal projects and at work. As a result, I wanted to visualize my Tailnet in Grafana to get better insights into its performance and usage. This post introduces the tailscale-exporter, a tool I built to collect Tailnet metrics directly from the Tailscale API. I’ll also show how to enable scraping of Tailscale client metrics and visualize everything in Grafana for complete observability across your Tailnet.
Cluster Autoscaler Monitoring with Prometheus and Grafana
Cluster autoscaler is a popular tool for automatically adjusting the size of a Kubernetes cluster based on the current workload. It helps ensure that your applications have enough resources to run efficiently while minimizing costs by scaling down unused nodes. However, monitoring the cluster autoscaler is crucial to ensure that it is functioning correctly and that your applications are running smoothly.
KEDA Monitoring With Prometheus and Grafana
KEDA is a tool that provides event-driven autoscaling for Kubernetes, allowing you to scale your applications based on external metrics. It uses the Kubernetes Horizontal Pod Autoscaler (HPA) to adjust the number of pods in a deployment based on metrics like CPU usage, memory usage, or custom metrics from external sources. It also supports scaling based on event sources like message queues, databases as a job and defines a new Custom Resource Definition (CRD) called ScaledJob to configure the scaling behavior. Monitoring KEDA effectively is crucial to ensure that your autoscaling policies are working as expected and that your applications are performing optimally.