We now descend into the lowest layer of Blacksmith: our dataplane. If the control plane is the brain, the dataplane is the muscle. This is where your GitHub Actions jobs actually run, on our hardware. More specifically, a fleet of 32 vCPU boxes, procured from data center providers in the US and EU. But this layer is not only filled with bare metal machines, but with Firecracker microVMs, GitHub Actions Runners, MinIO blob stores, Ceph storage clusters, Tailscale VPNs, and so much more. It is here, in the heat and pressure of the dataplane, that we isolate the execution of each GitHub Action job across three axes: CPU, Network, and Disk.
The path into our dataplane is intentionally a difficult one, and it begins with a strong first line of defense: our network. Our network is secured with
Tailscale, a VPN service that utilizes
WireGuard, an open-source framework for encrypted virtual private networks. With Tailscale, our fleet of bare metal machines lives behind a tight-knit, private network. Every one of them is part of a
Tailscale Tailnet, meaning SSH access is entirely locked down to the outside world. No public ports, no guessable IPs, no surprises. What’s more, communication between services in the dataplane flows through a Tailscale VPN. But it doesn’t stop there — all deployments to our machines happen exclusively over Tailscale SSH, ensuring encrypted, identity-based access between trusted devices only.
Once in, each physical machine that is a part of our private network runs an agent that, among other responsibilities, is tasked with authenticating to our AWS-hosted Redis queue using
Doppler-injected credentials and pulling job payloads from it. Once your job request is picked up by an agent, it runs your job in an ephemeral microVM managed by
Firecracker — the same microVM technology used by AWS to run millions of untrusted workloads for AWS
Lambda and
Fargate. These microVMs leverage Kernel-based Virtual Machine (KVM) virtualization to run their own guest kernel and user space, isolated from both the host and other microVMs. This strong isolation lets us safely run multiple customer workloads on the same machine — unlike Docker, where containers share the host kernel and rely on a much thinner security boundary. Firecracker also allows us to use cgroups to enforce CPU and memory limits on each microVM to ensure fairness across jobs, and prevent noisy neighbor problems.
When booted up, each microVM gets a copy-on-write clone of the root file system from GitHub’s official
GitHub Actions runner images, which we routinely update to acquire the latest dependency versions as GitHub releases them upstream. We also hydrate the microVM with GitHub’s official
GitHub Actions runner binary, which contains the logic responsible for coordinating with GitHub’s control plane to adopt a job. The same runner binary GitHub provides automatically obfuscates any secrets in command-line output and logs.
Once the official GitHub Actions runner binary is up and running, we rely exclusively on the JIT token to adopt and execute a single job. While your job is running, your code and secrets are safe from the outside world since each microVM operates within its own network namespace.
If all this isn’t enough, you’ll be happy to know that after your job completes, the VM is destroyed, along with any of its state, including its filesystem, ensuring that modifications — benign or malicious — don’t persist beyond the life of the job. This, of course, does not include the caching artifacts that you can opt in to store on our disks so they can be shared across job runs — speeding up your Docker builds and more generally your GitHub Actions workflows.
For those not using our caching features (bad choice!), our journey is nearing its end, and you may skip the next section and go straight to the conclusion. For all others (good choice!), we have a few more things to cover regarding how we secure access to your cached artifacts across job runs.