Kubernetes events offer valuable insights into the activities within your cluster, providing a comprehensive view of each resource’s status. While they’re beneficial for debugging individual resources, they often face challenges due to the absence of aggregation. This can lead to issues such as events being garbage collected, the necessity to view them promptly, difficulties in filtering and searching, and limited accessibility for other systems. The blog post explores configuring Loki with Alloy to efficiently scrape Kubernetes events and visualize them in Grafana.
This blog post presents an opinionated approach using Loki, Prometheus, and Alloy as the tools of choice. Loki serves as a cost-effective and user-friendly log aggregation system, while Alloy functions as a tool for telemetry collections and Prometheus stores time series data. The post focuses on the additional configuration required, assuming you have already installed Loki, Alloy, and Prometheus.
This blog post also introduces the kubernetes-events-mixin
where you can find a set of Grafana dashboards and Prometheus rules for monitoring Kubernetes events. It won’t work out of the box, and it requires configuration of Alloy and Loki as described in the rest of the blog post.
Configuring Alloy to Scrape Kubernetes events
First, use Alloy’s Kubernetes events source to scrape the Kubernetes events. Deploy Alloy using Helm with the following values which scrape the Kubernetes events and forward them to Loki:
alloy:
configMap:
content: |
loki.process "default" {
stage.replace {
expression = "(\"type\":\"Normal\")"
replace = "\"type\":\"Normal\",\"level\":\"info\""
}
forward_to = [loki.write.default.receiver]
stage.replace {
expression = "(\"type\":\"Warning\")"
replace = "\"type\":\"Warning\",\"level\":\"warning\""
}
stage.json {
expressions = {
"k8s_resource_kind" = "kind",
"k8s_resource_name" = "name",
}
}
stage.labels {
values = {
"k8s_namespace_name" = "namespace",
"k8s_resource_kind" = "k8s_resource_kind"
}
}
stage.structured_metadata {
values = {
"k8s_resource_name" = "k8s_resource_name"
}
}
stage.label_keep {
values = ["cluster", "organization", "region", "job", "k8s_namespace_name", "k8s_resource_kind"]
}
}
loki.source.kubernetes_events "default" {
forward_to = [loki.process.default.receiver]
log_format = "json"
}
loki.write "default" {
endpoint {
url = "http://loki-gateway.logging.svc/loki/api/v1/push"
}
external_labels = {
"cluster" = "my-cluster",
"environment" = "production",
"region" = "europe-west1",
}
}
enabled: true
controller:
type: statefulset
The configuration performs the following actions:
loki.source.kubernetes_events
- Scrapes the Kubernetes events and forwards them to the Loki processor.loki.process
- It handles the Kubernetes events by replacing thetype
field withlevel
, and adds labels and structured metadata. The structured metadata is crucial for filtering and searching the events. Grafana utilizes the level field to assess the severity of the event. The labelk8s_resource_kind
differentiates between various Kubernetes kinds alongsidek8s_resource_namespace
, which indicates the namespace the resource kind is in. They’re indexed, but typically, Kubernetes resource kinds shouldn’t lead to label cardinality issues since they’re usually limited in number. However, if you have many different API kinds, you might want to consider an alternative approach.loki.write
- Forwards the processed events to Loki. Theexternal_labels
field adds additional labels to the events, such as the cluster, environment, and region.controller
- Specifies the type of controller to deploy Alloy as. In this case, it deploys a statefulset controller. You only need a single instance of Alloy to scrape the Kubernetes events.
The events should be flowing into Loki after deploying Alloy with the preceding configuration. You can verify this by querying the Loki API for the Kubernetes events:
sum (count_over_time({job="loki.source.kubernetes_events"} | json [1m])) by (k8s_namespace_name, k8s_resource_kind, type)
Configuring Loki to Write Metrics to Prometheus
Loki’s strong suite isn’t aggregation over long periods of time or complex queries, which is where Prometheus comes in. Prometheus is a time series database that excels at storing and querying time series data. Therefore, instead of running complex log queries over long time periods, the remote_write
feature with recording rules applies complex queries only during short time intervals at a repeated interval. The goal is to count the number of events by k8s_namespace_name
, k8s_resource_kind
, and type
every minute and store that in Prometheus. This way the data can be easily queried in Grafana without putting too much pressure on Loki.
To write metrics from Loki to Prometheus, you need to configure Loki. Deploy Loki using Helm with the following values:
loki:
rulerConfig:
remote_write:
client:
url: http://prometheus-k8s.monitoring.svc:9090/api/v1/write
enabled: true
rule_path: /rules
storage:
local:
directory: /rules
type: local
wal:
dir: /var/loki/ruler/wal
Replace prometheus-k8s.monitoring.svc
with the Prometheus service endpoint. The configuration writes the metrics to Prometheus using remote writes.
Loki also requires configuration to load rules from ConfigMaps
. The following configuration enables a sidecar container to load the rules from a ConfigMap
:
sidecar:
rules:
folder: /rules/fake
label: loki.grafana.com/rule
labelValue: "true"
searchNamespace: ALL
The sidecar loads rules from the ConfigMap
with the label loki.grafana.com/rule=true
. It stores the rules in the folder /rules/fake
. Single tenant deployments use the fake
tenant folder.
Adding Prometheus Rules to Loki
To write metrics to Prometheus, you need to add Prometheus rules to Loki. Create a ConfigMap
with the following rules:
apiVersion: v1
data:
kubernetes-events.yaml: |-
"groups":
- "interval": "1m"
"name": "kubernetes-events.rules"
"rules":
- "expr": |
sum (count_over_time({job="loki.source.kubernetes_events"} | json [1m])) by (k8s_namespace_name, k8s_resource_kind, type)
"record": "namespace_kind_type:kubernetes_events:count1m"
kind: ConfigMap
metadata:
labels:
loki.grafana.com/rule: "true"
name: kubernetes-events
namespace: logging
If you configure everything correctly, querying the following metric in your Prometheus instance works:
namespace_kind_type:kubernetes_events:count1m
Grafana Dashboards
Now that you have the Kubernetes events in Loki and Prometheus, you can visualize them in Grafana.
As mentioned previously, the kubernetes-events-mixin
has two dashboards. A Kubernetes events overview, and a Kubernetes events timeline.
The upcoming sections describe each dashboard.
Kubernetes Events Overview Dashboard
The Kubernetes overview dashboard focuses on providing an overview of Kubernetes events, it uses primarily the Prometheus metrics to visualize the events. The following things are core for the dashboard:
- Summary - Provides a section that summarizes events over time by
kind
,namespace
, andtype
. It also shows an overview of the top source of warning/normal events over the last week. - Kind Summary - Provides a section that shows events by
kind
andnamespace
using the filters applied. It also shows a pie chart with a breakdown bytype
.
Kubernetes Events Timeline
The Kubernetes events timeline dashboard focuses on providing a timeline of Kubernetes events, it uses Loki logs to visualize the events. The dashboard offers more detailed insights into individual events but requires more aggressive filtering, limiting visualization only by kind
and namespace
. Also, the dashboard doesn’t prove useful without applying a search for the name
of the resource that originated the event. The following things are core for the dashboard:
- Events Logs - Displays events in a log panel with 100 entries, it shows the
name
,type
, andmessage
. Name searches are highly recommended, otherwise the logs are too noisy coming from too many sources of events. - Events Timeline - Displays a timeline of events by
kind
andnamespace
, it shows thetype
,reason
andmessage
of the event. Again, name searches are highly recommended.
Summary
This blog post explored configuring Loki and Alloy to efficiently scrape Kubernetes events and visualize them in Grafana. The post presented an opinionated approach using Loki, Prometheus, and Alloy as the tools of choice. It also introduced the kubernetes-events-mixin
where you can find a set of Grafana dashboards and Prometheus rules for monitoring Kubernetes events. This approach is an awesome improvement of previous event monitoring setups I’ve had. The Grafana UI, specifically the timeline panel, allows displaying events over time in a great way.