vertical-pod-autoscaler-2

Configuring VPA to Use Historical Metrics for Recommendations and Expose Them in Kube-state-metrics

Published on October 27, 2024, 16:29 UTC 1198 views
6 min read

The Vertical Pod Autoscaler (VPA) can manage both your pods’ resource requests but also recommend what the limits and requests for a pod should be. Recently, the kube-state-metrics project removed built-in support for VPA recommendation metrics, which made the VPA require additional configuration to be valuable. This blog post will cover how to configure the VPA to expose the recommendation metrics and how to visualize them in Grafana.

This blog post doesn’t go into detail on how to install the kube-state-metrics project, it assumes that you have it installed and only goes into details how to add additional VPA recommendation metrics.

Installing the Vertical Pod Autoscaler

The first step is to install the VPA, which you can do by using Fairwinds Helm chart. Set the values to enable Prometheus-operator’s PodMonitor and also configure the VPA to use Prometheus’ metrics as a history provider, ensuring that recommendations are based on historical data. The following values should be set in your values.yaml:

recommender:
  podMonitor:
    enabled: true
  extraArgs:
    storage: prometheus
    prometheus-address: |
      http://prometheus-k8s.monitoring:9090 # Adjust according to your Prometheus address

    // https://github.com/kubernetes/autoscaler/issues/5031#issuecomment-1450583325
    prometheus-cadvisor-job-name: 'kubelet'
    container-pod-name-label: 'pod'
    container-namespace-label: 'namespace'
    container-name-label: 'container'
    metric-for-pod-labels: 'kube_pod_labels{job="kube-state-metrics"}[8d]'
    pod-namespace-label: 'namespace'
    pod-name-label: 'pod'
    pod-label-prefix: 'label_'
updater:
  podMonitor:
    enabled: true

The extraArgs related to Prometheus configuration are necessary because the default values on these args have become outdated and don’t match the kube-state-metrics default labels.

Now generic VPA metrics that show VPA performance and activity are available in Prometheus by scraping the pod metrics, and the VPA recommendations rely on historical data.

Update 2025-03-21: the prometheus integration doesn’t seem to work well, as it fetches the metrics and stores them in a ClusterState. Then, it uses default Kubernetes recommended labels to match the stored metrics to the VPA, which doesn’t work fully. Discussion exists here.

Adding VPA recommendation metrics

As mentioned previously the kube-state-metrics project removed built-in support for VPA recommendation metrics, which means that you need to configure the kube-state-metrics to add the recommendation metrics.

First, adjust the ClusterRole for kube-state-metrics to include the following rules:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/part-of: kube-prometheus
  name: kube-state-metrics
rules:
    # ... other rules
    # Add the following rules which allow kube-state-metrics to read VPA resources
    - apiGroups:
      - autoscaling.k8s.io
      resources:
      - verticalpodautoscalers
      verbs:
      - list
      - watch
    - apiGroups:
      - apiextensions.k8s.io
      resources:
      - customresourcedefinitions
      verbs:
      - list
      - watch

Next, we will need to convert the status of the VPA resource to Prometheus metrics using kube-state-metrics with the help of the CustomResourceStateMetrics CustomResourceDefinition (CRD). Set the config using the --custom-resource-state-config argument when starting kube-state-metrics:

kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.13.0
  name: kube-state-metrics
  namespace: monitoring
spec:
    ...
      containers:
      - args:
        ...
        - --custom-resource-state-config
        - |
          kind: CustomResourceStateMetrics
          spec:
            resources:
              - groupVersionKind:
                  group: autoscaling.k8s.io
                  kind: "VerticalPodAutoscaler"
                  version: "v1"
                labelsFromPath:
                  verticalpodautoscaler: [metadata, name]
                  namespace: [metadata, namespace]
                  target_api_version: [spec, targetRef, apiVersion]
                  target_kind: [spec, targetRef, kind]
                  target_name: [spec, targetRef, name]
                metrics:
                  # Labels
                  - name: "verticalpodautoscaler_labels"
                    help: "VPA container recommendations. Kubernetes labels converted to Prometheus labels"
                    each:
                      type: Info
                      info:
                        labelsFromPath:
                          name: [metadata, name]
                  # Memory Information
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_target"
                    help: "VPA container recommendations for memory. Target resources the VerticalPodAutoscaler recommends for the container."
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [target, memory]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "memory"
                      unit: "byte"
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_lowerbound"
                    help: "VPA container recommendations for memory. Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it"
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [lowerBound, memory]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "memory"
                      unit: "byte"
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_upperbound"
                    help: "VPA container recommendations for memory. Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it"
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [upperBound, memory]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "memory"
                      unit: "byte"
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_uncappedtarget"
                    help: "VPA container recommendations for memory. Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds"
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [uncappedTarget, memory]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "memory"
                      unit: "byte"
                  # CPU Information
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_target"
                    help: "VPA container recommendations for cpu. Target resources the VerticalPodAutoscaler recommends for the container."
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [target, cpu]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "cpu"
                      unit: "core"
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_lowerbound"
                    help: "VPA container recommendations for cpu. Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it"
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [lowerBound, cpu]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "cpu"
                      unit: "core"
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_upperbound"
                    help: "VPA container recommendations for cpu. Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it"
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [upperBound, cpu]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "cpu"
                      unit: "core"
                  - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_uncappedtarget"
                    help: "VPA container recommendations for cpu. Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds"
                    each:
                      type: Gauge
                      gauge:
                        path: [status, recommendation, containerRecommendations]
                        valueFrom: [uncappedTarget, cpu]
                        labelsFromPath:
                          container: [containerName]
                    commonLabels:
                      resource: "cpu"
                      unit: "core"

The preceding configuration converts the VPA recommendation status to metrics that be scraped by Prometheus. These are available in the kube-state-metrics metrics endpoint.

Creating a Vertical Pod Autoscaler

To create a VPA, you can use the following example:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  labels:
    app.kubernetes.io/instance: hodovi-cc
    app.kubernetes.io/name: hodovi-cc
    app.kubernetes.io/version: 9ec45b512c915bfe2fabc1671713935890602534
  name: hodovi-cc
  namespace: apps
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hodovi-cc
  updatePolicy:
    updateMode: "Off" # Disable automatic updates

After creating the VPA, the status field gets updated with the recommendations. The kube-state-metrics converts the status to metrics that Prometheus can scrape. An example of the status for the VPA hodovi-cc is:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  labels:
    app.kubernetes.io/instance: hodovi-cc
    app.kubernetes.io/name: hodovi-cc
    app.kubernetes.io/version: 9ec45b512c915bfe2fabc1671713935890602534
  name: hodovi-cc
  namespace: apps
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hodovi-cc
  updatePolicy:
    updateMode: "Off" # Disable automatic updates
status:
  conditions:
  - lastTransitionTime: "2024-08-25T20:22:15Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: hodovi-cc
      lowerBound:
        cpu: 15m
        memory: "246562508"
      target:
        cpu: 23m
        memory: "297164212"
      uncappedTarget:
        cpu: 23m
        memory: "297164212"
      upperBound:
        cpu: 97m
        memory: "1254364671"

Now when Prometheus scrapes kube-state-metrics, all required metrics to visualize VPA recommendations are available.

Visualizing VPA recommendations in Grafana

I’ve created a Grafana dashboard that visualizes the VPA recommendations. You can find the dashboard here. I’ve also written a blog post on comprehensive Kubernetes autoscaling monitoring with Prometheus and Grafana here and created a kubernetes-autoscaling-mixin that includes all the dashboards and alerts for Kubernetes autoscaling components.

The Grafana dashboard provides an overview of the VPA recommendations for both memory and CPU. It includes the following panels:

  • Namespace Summary - Provides an overview of the VPA recommendations per namespace. See the memory and CPU target and lower and upper bounds for each VPA in the selected namespace.
  • VPA Summary - Provides a history of recommendations for the selected VPA. See the historical memory and CPU target and lower and upper bounds for each container in the selected VPA. Also, it provides a summary for what resource configuration would be required for guaranteed and burstable QoS classes.

Vertical-pod-autoscaler-1

Vertical-pod-autoscaler-2


Similar Posts

Comprehensive Kubernetes Autoscaling Monitoring with Prometheus and Grafana

7 min read

The kubernetes-mixin is a popular resource for providing excellent dashboards and alerts for monitoring Kubernetes clusters. However, it lacks comprehensive support for autoscaling components such as Pod Disruption Budgets (PDB), Horizontal and Vertical Pod Autoscalers (HPA, VPA), Karpenter, and the …


Configuring Kube-prometheus-stack Dashboards and Alerts for K3s Compatibility

6 min read

The kube-prometheus-stack Helm chart, which deploys the kubernetes-mixin, is designed for standard Kubernetes setups, often pre-configured for specific cloud environments. However, these configurations are not directly compatible with k3s, a lightweight Kubernetes distribution. Since k3s lacks many of …


Django Monitoring with Prometheus and Grafana

6 min read

The Prometheus package for Django provides a great Prometheus integration, but the open source dashboards and alerts that exist are not that great. The to-go Grafana dashboard does not use a large portion of metrics provided by the Django-Prometheus package, …