Creating a Low Cost Managed Kubernetes Cluster for Personal Development using Terraform

Kubernetes is an open-source system that's popular amongst developers. It automatically deploys, scales, and manages containerized applications. Yet for those working outside of the traditional startup or corporate setting, Kubernetes can be CPU and memory-intensive, disincentivizing developers from using a strategically beneficial tool. We'll highlight some of the constraints posed by using Kubernetes and offer a series of low-cost, practically-beneficial solution: Google Kubernetes Engine deployed and managed with Terraform.

Want to run Kubernetes locally? Kind is a tool that’s remarkably easy to set up and experiment with. It even supports multi-node clusters! However, the abundance of pods and nodes can be CPU/memory intensive. Another potential roadblock is the popularity of managed Kubernetes solutions: since many companies go this route, you'd want to have some experience with a managed solution. Gaining this experience in a non-corporate setting means personally using Managed Kubernetes –– a costly investment for a solo developer. The baseline cost for an EKS cluster is $72 per month, with similar pricing for regional AKS and GKE clusters. Add in additional node costs for running your workloads and before you know it, you’re running a hefty monthly bill. Here are some potential ways to address computer and cost constraints with Kubernetes.

Lowering costs

There are two ways to lower the primary managed Kubernetes cluster costs:

A free cluster(no monthly cost for the control plane).
Spot instances/Preemptible VMs - instances that can be terminated at any time within 2 minutes.

AWS does not offer a free cluster in any way, you have a $72 monthly cost from the get go. On top of that their managed node groups do not offer spot instances currently, you'd have to use custom node groups to use spot instances which can be achieved with eksctl or EKS Terraform modules. Regarding AKS, I've never used AKS or Azure Cloud in any form so I'll not explore that option.

Google Cloud offers a free zonal cluster per account. A zonal cluster has a Kubernetes control plane only available in a single zone –– Which means that if there is an outage in that zone, the unavailable control plane prevents you from configuring your workloads. However, GKE still has an SLA with a guaranteed availability of 99.5% uptime for a zonal cluster and you'd most likely not be heavily affected by downtime since you use the cluster for personal learning and experimentation.

Another feature of Google Cloud is the ability to create a managed node pool. Additional benefits include preemptible VMs for your GKE cluster, along with a Terraform resource that supports creating node pools with preemptible VMs. Google Cloud's preemptible VMs have no guaranteed uptime and are removed after 24h. Usually, you’ll have a couple of minutes of downtime daily for all workloads scheduled on preemptible VMs. However, these instances are up to 80% cheaper!

Creating a Zonal Cluster with a Preemptible Node Group using Terraform

To create a zonal cluster you need to specify only a single zone as the cluster location, but the nodes can be located in multiple zones.

resource "google_container_cluster" "default" {
  name               = "my-cluster"
  location           = "europe-west1-b" # MUST BE A SINGLE ZONE, OTHERWISE IT COUNTS AS A REGIONAL CLUSTER
  min_master_version = "version"

  node_locations = ["europe-west1-b", "europe-west1-c"] # CAN BE MULTI ZONE

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1
}

We will reference the above cluster when creating a node pool.

The node pool will consist of memory-optimized n2d-highmem-4 instances and they will have preemptible set to true. The n2d-highmem-4 instances cost $133.16 monthly but running them in a preemptible node group brings down the costs to $40.27.
These instances have 4 vCPUs and 32GB of memory making it possible to run many various workflows, you can deploy prometheus-operator, elasticsearch and anything else you'd like to learn!
You can scale down and run smaller instance types as the standard n2d-standard-2 machine type that has 2 vCPUs and 8GB of memory which is priced at $14.93 monthly when running in a preemptible node pool($49.36 monthly when not running in a preemptible node pool).
There are even smaller instances as the shared-core f1-micro(monthly $4.53 - default / $1.36 - preemptible) and g1-small(monthly $11.76 - default / $3.54 - preemptible) instance types, but they have low computing power.

There are some additional costs for networking and storage but they should be minimal compared to instance and control plane costs for a small cluster.

resource "google_container_node_pool" "memory_optimized" {
  name     = "Memory optimized node pool"
  cluster  = google_container_cluster.default.name

  version = "my-version"

  initial_node_count = 1
  autoscaling {
    min_node_count = 1
    max_node_count = 3
  }

  management {
    auto_repair = true
  }

  node_config {
    machine_type = "n2d-highmem-4" # Memory-optimized
    preemptible  = true # Preemptible needs to be true
  }
}

Tainting nodes

If you prefer higher uptime for some applications and want to avoid scheduling some pods to the preemptible node pool you can taint the node pool. Tainting means that pods don't get scheduled to a particular node pool unless they tolerate the set taints. The below example sets taints using the key role with the value ops with the effect NO_SCHEDULE, meaning that a pod needs to tolerate these taints to be scheduled onto the nodes in that node pool.

resource "google_container_node_pool" "memory_optimized" {
  name     = "Memory optimized node pool"
  cluster  = google_container_cluster.default.name

  version = "my-version"

  initial_node_count = 1
  autoscaling {
    min_node_count = 1
    max_node_count = 3
  }

  management {
    auto_repair = true
  }

  node_config {
    machine_type = "n2d-standard-4"
    preemptible  = true # Preemptible needs to be true

    # Set the taints
    taint {
      key    = "role"
      value  = "ops"
      effect = "NO_SCHEDULE"
    }
  }
}

To tolerate the node pool taints add the tolerations section to a Kubernetes spec:

---
apiVersion: v1
kind: Deployment
metadata:
  name: '{{ app_name }}'
  labels:
    app: '{{ app_name }}'
  namespace: '{{ namespace }}'
spec:
  selector:
    matchLabels:
      app: '{{ app_name }}'
  template:
    metadata:
      name: '{{ app_name }}'
      labels:
        app: '{{ app_name }}'
    spec:
      ############### This section is needed ###############
      tolerations:
        - effect: NoSchedule
          key: role
          operator: Equal
          value: ops
      containers:
        <your containers>
      ######################################################

Now you can create cheap preemptible node pools that are dedicated to specific workflows. For example you might want to learn how to create a Elasticsearch cluster with multiple replicas, then you might want a preemptible node pool for the cluster since it requires a lot of computing power. Or you can create and use only preemptible node pools for your whole cluster. Combine that with GKE's free zonal cluster and you can bring down costs to ~$20 monthly with a decently sized instance type n2-standard-2. Here's a link to the pricing calculation using GCP's calculator. Take advantage of it, and get experience using a managed Kubernetes solution.