Waiting for CI to finish slows down development and can be extremely annoying, especially when CI fails and you have to run it again. Let's take a look into approaches on how to speed up your CI and minimize the inefficient time spent by developers when waiting on CI to finish.
We'll go through 6 different methods to speed up CI:
- Makisu as your Docker build tool
- Caching between Stages
- Concurrent Jobs for each Stage
- Running Tests across all CPUs
- Parallel Tasks
- Autoscaling CI Runners
Building Docker Images with Makisu + Redis KV storage
We've tried three approaches; default image building with Docker using --cache-from
, Google's Kaninko with --cache=true
and Uber's Makisu with Redis KV storage.
Kaniko is Google's image build tool which enables building images without a docker daemon, making it great for building images within a Kubernetes cluster as it is not possible to run a docker daemon in a standard Kubernetes cluster.
Makisu is also an image build tool that does not rely on the docker daemon making it work great within a Kubernetes cluster and it also adds a couple of new things as e.g distributed cache support and customizable layer caching, possibility to choose what layers to cache.
Docker provides it's default image building with options as --cache-from
to chose images to cache from. Building images with Docker can be excluded as an option if you are running CI within a standard Kubernetes cluster.
We have had the fastest CI with Makisu even though the setup is more complex due to deploying Redis.
With Makisu we got the following perks:
- Distributed cache support, builds that share Dockerfile directives can share cache. Cache hit rate is better across branches than Docker's
--cache-from
which is great for single branch cache. - Great image compression for large images
- Customizable layer generation and caching, possibility to choose what layers to cache and generate with the function
#!COMMIT
. - Overall we experienced a faster build time than with the other build tools.
You can read more on why Makisu is great here. Now let's get going by deploying a Redis node for Makisu.
Deploying Redis(AWS Elasticache) with Terraform
Redis is an in-memory key-value storage and we deploy it alongside Makisu as Makisu uses it as a key-value storage to map the Dockerfile directives with the hash of the Dockerfile directive layers. Elasticache is a fully managed Redis solution by AWS.
Below is a example of how to deploy an Elasticache Redis service with Terraform. If you intend or have already deployed Redis in a different way skip til the next section.
We'll create a Elasticache subnet group in your chosen VPC for the Redis node with Terraform:
resource "aws_elasticache_subnet_group" "gitlab_runner" {
name = "gitlab-runner"
subnet_ids = module.[ci_runners_vpc].private_subnets
}
And then we'll create the Elasticache Redis cluster within the subnet group created.
resource "aws_elasticache_cluster" "gitlab_runner" {
cluster_id = "gitlab-runner"
engine = "redis"
node_type = "cache.t3.micro"
num_cache_nodes = 1
subnet_group_name = aws_elasticache_subnet_group.gitlab_runner.name
parameter_group_name = "default.redis5.0"
engine_version = "5.0.5"
port = 6379
}
Note that the Elasticache node needs to share the same VPC(Virtual Private Cloud, used to share network amongst AWS resources) as the CI runners deployed, otherwise the runners won't be able to access the Redis service.
Creating an VPC to be sharable between the Elasticache node and your CI runners can be simplified with the Terraform VPC module:
module "gitlab_runner_vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "gitlab_runner_vpc"
cidr = "10.1.0.0/16"
azs = ["eu-west-1a"]
private_subnets = ["10.1.1.0/24"]
public_subnets = ["10.1.101.0/24"]
map_public_ip_on_launch = "false"
tags = {
Environment = var.environment
Terraform = true
}
...
}
Now you can specify the VPC created for both the CI VMs and the Elasticache subnet group. For example we use the awesome terraform module to deploy Gitlab runners on cheap spot instances and as shown below you use vpc_id = module.gitlab_runner_vpc.vpc_id
to place CI VMs into the VPC created above.
module "gitlab_runner" {
source = "npalm/gitlab-runner/aws"
aws_region = var.aws_region
environment = var.environment
vpc_id = module.gitlab_runner_vpc.vpc_id
subnet_ids_gitlab_runner = module.gitlab_runner_vpc.private_subnets
runners_name = "gitlab_runner_honeylogic"
runners_gitlab_url = "https://gitlab.com"
runners_concurrent = "5"
runners_limit = "5"
runners_idle_time = "1800"
runners_idle_count = "0"
instance_type = "t3.large"
...
}
Then when you create the Elasticache subnet group, specify the VPC's subnets you created previously:
resource "aws_elasticache_subnet_group" "gitlab_runner" {
name = "gitlab-runner"
subnet_ids = module.gitlab_runner_vpc.private_subnets
}
Now you have Gitlab runners that have access to an Redis(Elasticache) node which Makisu uses as a key value storage for caching.
Using Makisu
Now when Redis is setup we'll setup the Gitlab-CI stages(we use Gitlab-CI but tailor the setup to your own CI). We create templates for common CI builds so they become reusable across projects. The below template creates a docker image and tags it with latest and the git commit short sha.
gitlab-ci-makisu-build.yml
.build:
extends:
- .default_vars
stage: build
image:
name: gcr.io/makisu-project/makisu-alpine:v0.1.12
entrypoint: [""]
before_script:
- echo "{\"$CI_REGISTRY\":{\".*\":{\"security\":{\"basic\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}}}" > $REGISTRY_CONFIG
variables:
REDIS_CACHE_ADDRESS: my-redis-address:6379
REGISTRY_CONFIG: /registry-config.yaml
build:default:
extends:
- .default_vars
- .build
script:
- /makisu-internal/makisu --log-fmt=console build --push=$CI_REGISTRY --modifyfs -t=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA --replica $CI_REGISTRY_IMAGE --redis-cache-addr=$REDIS_CACHE_ADDRESS --registry-config=$REGISTRY_CONFIG $CI_PROJECT_DIR
To better grasp the YML code piece and all predefined environment variables below I'll explain how the above job operates:
- Clones the repository and Gitlab-CI populates a bunch of predefined environment variables.
- We echo the credentials by using predefined variables as
$CI_REGISTRY
(CI registry for the project) and$CI_REGISTRY_USER/PASSWORD
into a registry config file which Makisu uses for authorization. - The
.build
definition is just used as a template for all other Makisu build stages so common tasks are stored in the template. Templating stages is a Gitlab-CI feature. - We create two variables the
REGISTRY_CONFIG
location and theREDIS_CACHE_ADDRESS
which are accessible by any stage that extends the.build
stage. - Now we can create the
build:default
job which extends thebuild
stage and then add the script where we use Makisu to build the image. - We use a couple of variables as
-t
(required) the tag for the image but also--replica
which indicates the additional tags you want to push the image as. As mentioned previously we useimage:latest
andimage:git_commit_short_sha
as tags as you can see in the command. - Remember to specify the
--redis-cache-addr
so it uses Redis as a KV store. You also always need to specify the build context which by default is the$CI_PROJECT_DIR
in our case.
Then we include the Makisu build stage in any repository where we need to build docker images:
include:
- project: 'honeylogic/gitlab-ci-templates'
ref: master
file: 'gitlab-ci-base.yml'
- project: 'honeylogic/gitlab-ci-templates'
ref: master
file: 'gitlab-ci-makisu-build.yml'
I've written a blog on Gitlab-CI templates which will help you grasp how the pieces work together and how you extend and reuse stages from templates.
Uber's Makisu increases the speed of Docker builds by a great amount of time(average 30-40% for us, for Uber itself it is from 40% on average up to 90% faster) and it has many more extensible options. You do not have to use Redis, you can use a HTTP cache. There is a possibility as well to not cache each Dockerfile directive as some stages can be excessive to cache and layer. An example is shown below which is used with the --commit=explicit
flag. Specify #!COMMIT
and it will only cache the layers with the #!COMMIT
comment.
Dockerfile
RUN pip install -r requirements.txt #!COMMIT
Caching between Stages
This section is not correlated with the previous Docker image build caching.
When running CIs you'll most likely have several stages, and sometimes you might the run similar commands across stages. An example of this are dependencies e.g React's NPM modules that you install. For each stage you do not want to install all NPM modules, instead you want to cache them and make them reusable to not slow down the CI with repetitive jobs. With Gitlab caching between stages is achievable by defining the cache variable in your CI yaml file.
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
Now the node modules created do not need to be recreated for each stage. Many other CIs support this as Travis-CI and Circle-CI.
Running Tests across all CPUs
Run your tests spread across all CPUs which will speed CI up, make sure you run VMs that are compute based. As I am familiar with Python the most I'll show an Python example.
Python with Pytest-xdist
Achieving this when running python tests with pytest and the plugin pytest-xdist is simple. Pytest-xdist is a plugin which allows parallelization for tests amongst other things.
Install pytest-xdist with pip:
pip install pytest-xdist
Add an argument indicating number of cores to be used or auto to automatically detect the number of cores the machine has.
pytest -n [auto/number of cores]
The speed up won't be highly noticeable for small test suites, but for anything larger it is great. Remember to use compute-based VMs with several vCPUs to maximize the effect of parallelization.
Concurrent Jobs for each Stage
Spread your jobs for each stages across several VMs and run all jobs concurrently. Do not stack jobs sequentially, e.g run linting and testing seperately as they do not depend on each other. Gitlab supports this by default, all jobs in a stage run concurrently on different Gitlab-Runner VMs. As for Gitlab-CI you only need to set a limit of runners and a limit of concurrent runner higher than 1. With the Terraform module mentioned previously you can do this by:
module "gitlab_runner" {
source = "npalm/gitlab-runner/aws"
runners_concurrent = "5"
runners_limit = "5"
}
You can read more in the Gitlab's docs to grasp how to fully customize it and for other CIs as e.g Circle-CI, they provide the same functionality.
Parallel Tasks
We use Task as a task runner for CI. Task is a task runner / build tool which I prefer due to it's syntax language. In Task you can run all dependencies for a task in parallel and this makes the tasks complete faster.
With Task you can specify dependencies with deps
:
test:
deps:
- setup xyz
- setup abc
- setup efg
desc: Run tests
cmds:
- pytest -n auto --cov
All dependencies run in parallel for better and faster performance.
Autoscaling your CI Runners
Create and manage your own runners, use AWS/GCP spot instances and autoscale your runners. Do not have jobs waiting around. This is easily achievable with the Terraform module terraform-aws-gitlab-runner. Just increase the limit and amount of concurrent runners, if you are worried about costs, lower the timeout so instances scale down when not used. Make sure to use spot instances as they are around 70% cheaper.
Summary
Several approaches to speed up your CI have been displayed, some trivial and some more advanced. Even though there was a focus on Gitlab-CI and Terraform, same functionality should be achievable without Terraform and a different CI platform. Long CIs can be blocking and are annoying for developers therefore we try to minimize CI completion time as much as possible.