Docker has fundamentally changed how developers build and deploy applications, with most companies leveraging containers in some capacity. At Blacksmith, we regularly see our customer’s Docker builds taking 30 minutes or more, which can significantly hinder developer productivity and delay the deployment of hot-fixes.
In this post, we’ll give you the exact steps needed to setup a remote BuildKit instance on AWS. The 30 minutes it’ll take to follow this blog can dramatically speed up your Docker builds for your entire org. But before diving deeper, let's review some Docker fundamentals. There are three primary levers one can pull to optimize Docker build times:
BuildKit is a modern backend that replaces the legacy Docker builder, offering improved performance and new features.
However, the most relevant feature for our use case is BuildKit's ability to execute builds on a remote instance. This lets us offload the build process from the local machine to a more powerful remote server.
BuildKit achieves this by using a client-server architecture. The BuildKit client, which runs on your local machine or CI/CD runner, communicates with the remote BuildKit daemon over a secure connection. When you initiate a build, the client sends the build context (Dockerfile, source code, etc.) to the remote daemon, which executes the build and streams logs back to the client.
Let's circle back to the first two levers we mentioned for accelerating Docker builds: using a powerful machine and having a persistent build cache. BuildKit allows you to run a remote Docker builder instance on any cloud provider, and we'll walk through an example of setting this up on AWS.
By running BuildKit on AWS, we can:
We've created a Terraform configuration file in this repository https://github.com/useblacksmith/remote-buildkit-terraform to automate the provisioning and configuration of the necessary resources for running our remote BuildKit instance on AWS. At a high level, here's what it does:
c5a.4xlarge
EC2 instance to host the BuildKit daemon. This instance type provides:us-east-2
, but you can run it in whichever region you prefer. However, you should verify that the Amazon Machine Image (AMI) we're using is available in that region.Note that our BuildKit instance is configured to run on port 9999
of the EC2 instance. The instance's public IP address and this port are essential for configuring the GitHub Actions workflow to connect to the remote BuildKit instance.
To start, follow these steps:
terraform.tfvars
to point to where you’re running your GitHub Actions:github_org
: your GitHub organizationgithub_repo
: your repository nameterraform init
terraform plan
terraform apply
You will notice the following output, take note of this output.
buildkit_instance_public_ip = <IP-name>
github_actions_role_arn = "arn:aws:iam::<ACCOUNT-ID>:role/GithubActionsBuildKitRole"
Once the remote BuildKit instance is up and running, it's time to modify your secrets and the GitHub Action workflow files running docker builds.
AWS_ACCOUNT_ID
: Paste the AWS secret output from the Terraform configuration.BUILDKIT_HOST
: Paste the public IP address of the provisioned EC2 instance.BUILDKIT_HOST
secret along with port 9999 when specifying the remote BuildKit server endpoint (e.g., tcp://${{ secrets.BUILDKIT_HOST }}:9999
).
steps:
- uses: actions/checkout@v3
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/GithubActionsBuildKitRole
aws-region: us-east-2
- uses: docker/setup-buildx-action@v2
with:
driver: remote
endpoint: tcp://${{ secrets.BUILDKIT_HOST }}:9999
- uses: docker/build-push-action@v2
with:
context: .
file: ./Dockerfile
push: false
tags: test-image:latest
load: true
When we triggered a build, our first uncached run took 6:22 minutes.
When we reran the job, the cached run only took 1:34 minutes.
As you can see from the logs, each layer had a cache hit, significantly improving the build time.
Caching Docker layers significantly improves build times, reducing the build duration from 6:22 minutes to just 1:34 minutes in this example. Docker layer caching is particularly effective in situations where your “base” layers observe minimal changes, as Docker rebuilds only the layers starting from the modified one while reusing the cached layers that remain unaltered. This cache will be shared across your entire org and so all CI builds will benefit from it.
Although using a single shared BuildKit instance hosted on an EC2 machine is a simple approach, we do want to callout some drawbacks that limit its scalability and effectiveness for larger engineering teams.
You could explore dynamic suspension and resumption of EC2 instances, hot loading EBS volumes or using spot instances (to reduce costs). The main tradeoff is that suspending and provisioning new instances would increase CI wait times since provisioning a new instance for each build has a cold start time associated with it. AWS has pointers on decreasing the boot-up time for instances using EBS volumes.
In conclusion, using a powerful remote BuildKit instance can significantly reduce Docker build times for small to medium-sized teams. While there are scalability concerns, we’ve seen many teams get very far with this solution as it offers a simple yet effective way to improve build performance, allowing you to focus on what matters.