[ Customer story ]

How Celery made their GitHub Actions 4x faster and stopped waiting 4-hours on PRs with Blacksmith

Industry
Open Source
Developers
1180
Previous
GitHub-hosted
Problem
Celery's open-source community was facing major reliability and performance issues with its CI testing infrastructure in GitHub Actions as they started to parallelize more and more jobs, resulting in flaky tests and pull requests having to wait up to four hours to get compute resources.
Solution
Now, through our new Open Source Sponsorship Program, Celery’s maintainers can commit code in minutes instead of hours, without worrying about unreliable infrastructure causing flaky tests or limits on the number of jobs they can run in parallel.
Results
4x
faster deployment times
4x
contribution rate
Want similar results?
Try us Free

Celery is a popular Open Source Distributed Task Queue written in Python. It’s a piece of critical software infrastructure used by large companies like Bloomberg, Robinhood, Wolt, Udemy, Sentry, GitGuardian, Semgrep, and many more to distribute work across threads or machines at scale.


As an Open Source project on GitHub, Celery was initially reaping the benefits of GitHub’s Free Plan. However, with great popularity comes great reliability requirements for the project — and a rapidly growing QA infrastructure to guarantee it. And so, with the collaboration of a sponsor, Celery began building and integrating a new large enterprise-level QA infrastructure.

Celery’s new, shiny QA infrastructure served dual purposes: enabling the core team to use it as part of their CI in order to provide strong reliability guarantees to enterprises customer at every release, and allowing those same customers to directly use it to simulate production-like environments that not only included Celery, but their own application logic. As this resource-intensive QA infrastructure grew, Celery unsurprisingly began pushing the limits of GitHub’s Free Plan.

Problem: When GitHub’s free plan isn’t enough

For a long time now, GitHub has been synonymous with Open Source, and rightfully so. For years, they’ve offered free plans for all public repositories, making them culturally dominate in the Open Source community. So, it’s no surprise that, before Blacksmith, Celery relied on GitHub-hosted runners. However, it’s safe to say that GitHub has recently dropped the ball. While the infrastructure demands for OSS projects have skyrocketed, GitHub’s server hardware has remained… well, old, old, and older. Perhaps their preferred adjective is vintage.

As a result, Celery’s new QA infrastructure began to feel GitHub’s limitations. They first started to experience a significant number of infrastructure instabilities with GitHub-hosted runners, leading to flaky issues like false positives and random interruptions. Their CI jobs couldn’t be trusted to provide reproducible results. On top of that, they were facing massive queuing issues. With the Free Plan, you can run a total of 20 concurrent jobs. Even if you decided to pay for the next level (the Pro Plan), you would still be limited to 40 concurrent jobs. If this limit is exceeded, any additional jobs are queued.

To give you a sense of Celery’s scale: when a new PR is opened, it runs about 30 simulated production environments in parallel, each with different configurations, multiple Docker containers, and a matrix of Python versions from 3.8 to 3.13 — totaling up to 120 jobs just to test Celery.

So just a few PRs could choke the entire organization and cause massive queuing, waiting for GitHub’s standard runners. The situation became so bad that, if three PRs were created simultaneously, the third one could wait up to 4 hours just to have VMs provisioned! You can imagine debugging issues wasn’t a pleasant endeavor.

"We were unable to unleash the full potential of the new QA infrastructure due to the limits of GitHub-hosted runners."
Tomer Nosrati, Tech Lead, Celery

Solution: Selecting Blacksmith over GitHub’s free plan

Beyond general performance and reliability improvements, Tomer desperately wanted the ability to run multiple jobs simultaneously without hitting any concurrency limits. One of his top priorities was maintaining the same SLA without compromising on quality, and avoiding contributions getting queued up like an endless line at a coffee shop. But, as an open-source project with minimal funding, the options were scarce…


With all their GitHub Actions issues seemingly solved by Blacksmith’s value proposition, Tomer reached out to us via email for help with this major bottleneck in their development process. We’re definitely not ones to sing the song of GitHub-hosted runners, but this situation felt ridiculous — especially for such a large and beloved open-source project to be struggling this much just testing their code with GitHub Actions.

As we spoke with more and more open-source projects, we quickly realized that performance and reliability issues with GitHub Actions weren't just limited to Celery. With that in mind, we not only decided to sponsor Celery’s usage of Blacksmith but we also decided to launch an Open Source sponsorship program where we give away Blacksmith's compute and storage solutions to Open Source projects that need them to solve the job GitHub’s Free Plan no longer can. Instead of diving into all the reasons why, we'll leave you with this iconic meme:

After taking care of a few logistics on our end, Celery waited very little time before creating a pull request with our migration wizard, which automatically migrated all of their workflows onto Blacksmith!

"The integration was smooth and effortless. With just a few clicks, we upgraded our entire organization."
Tomer Nosrati, Tech Lead, Celery

Results: Merging PRs faster with 0 queued jobs

Now, every Celery contributor around the globe enjoys a 4x performance boost to their CI, thanks to Blacksmith’s high-performance gaming CPUs, unlimited concurrency, and extensive caching optimizations. Contributors can spin up dozens of PRs and hundreds of CI jobs to verify their contribution using the new QA infrastructure without queuing. A PR that would normally take around 30 minutes (and up to 4 hours just to be picked up when there were multiple PRs) now zooms through in just 7 minutes!

In the end, faster GitHub Actions improved the project’s SLA and QA. As Tomer likes to say “Performance is experience.” Not to mention, with better reliability than GitHub-hosted runners, they’ve ditched old hacks and workarounds they always intended to address but could never seem to find the time for, making their CI environment a lot simpler to maintain. And they could also tune back up their testing infrastructure to their max settings. No longer trading off reliability for performance.

"After switching to Blacksmith, everything else just feels primitive."
Tomer Nosrati, Tech Lead, Celery

If your OSS project — or one you love — is struggling with CI, let us lend a hand. Nominations are open here.

World globe

Start with 3,000 free minutes per month or book a live demo with our engineers