Docker has truly changed how we package, distribute, and deploy applications. At its core is the Docker’s build process, where a Docker image is created from a Dockerfile. In this post, we’ll explore several strategies to optimize Docker builds, making them faster and the resulting images more space efficient.
Before diving into optimization strategies, let's understand the most essential instructions that make up a Dockerfile:
These four instructions form the core of most Dockerfiles. They allow you to select a base image, execute commands to install dependencies and configure the image, copy necessary files, and define the default command to run when a container is started.
Docker uses a build cache to speed up the build process. Each instruction in your Dockerfile is treated as a separate layer. If the layer hasn't changed, docker caches these layers and reuses them in subsequent builds. This means that if you modify an instruction in your Dockerfile, Docker will use the cache for all the layers before the modified instruction and rebuild the layers after it. This happens because each layer depends on the layers before it. Modifying an instruction changes the state of the image, and subsequent layers need to be rebuilt to ensure consistency and include the changes.
The order of instructions in your Dockerfile matters. You should structure your Dockerfile in a way that maximizes cache utilization. Here are a few ways to improve that:
Place instructions less likely to change at the top of your Dockerfile. For example, if your application dependencies don't change frequently, install them early in the Dockerfile. This way, Docker can reuse the cached layers for these instructions in subsequent builds.
Let’s consider the following exampleThis setup bundles the application code and dependencies, leading to inefficient caching. The COPY . .
command copies the entire project, including node_modules
, if it exists locally. As a result, even if package.json
and package-lock.json
remain unchanged, any modification to the application code forces Docker to re-run npm install
, even though the dependencies haven't changed. If your application code changes frequently but dependencies remain unchanged, this approach unnecessarily rebuilds dependencies with each change.
Here’s a more optimized version
FROM node:14
WORKDIR /app
COPY package.json package-lock.json /app/
RUN npm install
COPY . .
RUN npm build
package.json
and package-lock.json
files first before copying the entire project directory. This allows Docker to cache the RUN npm install
instruction separately from the application code.package.json
and package-lock.json
files haven't changed, Docker can reuse the cached layer for RUN npm install
, even if the application code has changed.COPY . .
instruction is placed after RUN npm install
, so changes to the application code won't invalidate the cache for the dependency installation step.Group related commands together in a single RUN instruction. Each RUN instruction creates a new layer, so combining related commands reduces the number of layers and improves build efficiency. More layers lead to a higher image size and would require a higher number of cache hits.
Consider this example
Instead of creating four layers, you can combine these related commands into a single RUN instruction
RUN apt-get update && \
apt-get install -y package1 package2 && \
apt-get clean
Multi-stage builds allow you to create smaller and more efficient Docker images by separating the build and runtime environments. This is particularly useful when you have build dependencies that are not needed at runtime.
Before multi-stage builds, you might have a Dockerfile like this
In this example, the final image includes the entire Node.js build environment, which is unnecessary for running the application.
With multi-stage builds, you can split the Dockerfile into multiple stages.
# Build stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Runtime stage
FROM node:14-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY package*.json ./
RUN npm ci --only=production
CMD ["npm", "start"]
In this optimized version
build
stage uses the full Node.js image to install dependencies and build the application.runtime
stage uses a minimal Node.js image (node:14-alpine
) and copies only the built files and production dependencies from the build
stage.The resulting image is significantly smaller because it doesn't include the build tools and intermediate artifacts.
BuildKit is the default Docker backend and it enables parallelizing builds for a multi-stage build.
Consider a Dockerfile that builds a Go application with multiple components
In this example,
base
stage sets up the common dependencies by copying go.mod
and go.sum
files and downloading the required packages.build-api
and build-worker
stages can be executed in parallel as they depend only on the base
stage. Each stage builds a specific component of the application.gcr.io/distroless/base-debian11
) and copies the compiled binaries from the build-api
and build-worker
stages.With BuildKit, Docker will automatically detect the independent stages and execute them in parallel, speeding up the overall build process.
Smaller Docker images are faster to build, push, and pull. They also consume less disk space and memory. Here are some strategies to minimize your image size:
golang:alpine
image instead of the full-fledged golang
image.RUN
instruction. This practice helps reduce the size of your final Docker image by removing unnecessary files that are not needed at runtime.apt-get update
and install the curl
package using apt-get install
. However, after the installation, we remove the package lists stored in /var/lib/apt/lists/
using rm -rf /var/lib/apt/lists/*
. This cleanup step removes the cached package lists, which are not required once the packages are installed.When copying files into your Docker image, use the appropriate instructions
COPY
for most cases, as it’s straightforward and only copies files from the host to the image. ADD
can perform extra tasks such as extracting a local tar file directly into the container’s filesystem or reading from remote URLs. This can result in unwanted artifacts ending up in your Docker image..git
, .svn
, or .hg
that contain version control metadata.node_modules
, dist
, build
, or target
directories..tmp
, .log
, or .swp
..env
files containing development configurations or secrets.README.md
, unless they are needed in the image.To summarize, the order of your Dockerfile can drastically improve your cache efficiency. Your Docker image only needs to contain runtime dependencies, not build dependencies. Having a smaller image size can also help you pull/push faster and contribute to improving developer productivity.