Optimizing Your Docker Builds Docker has truly changed how we package, distribute, and deploy applications. At its core is the Docker’s build process, where a Docker image is created from a Dockerfile. In this post, we’ll explore several strategies to optimize Docker builds, making them faster and the resulting images more space efficient.
Before diving into optimization strategies, let's understand the most essential instructions that make up a Dockerfile:
FROM: Specifies the base image to use as the starting point for the build. This is typically the first instruction in a Dockerfile.RUN: Executes a command inside the image during the build process. Each RUN instruction creates a new layer in the image.COPY: Copies files or directories from the host machine to the image. It's the preferred instruction for copying files.CMD: Specifies the default command to run when a container is started from the image. There can only be one CMD instruction in a Dockerfile.These four instructions form the core of most Dockerfiles. They allow you to select a base image, execute commands to install dependencies and configure the image, copy necessary files, and define the default command to run when a container is started.
Understanding Docker Build Cache Docker uses a build cache to speed up the build process. Each instruction in your Dockerfile is treated as a separate layer. If the layer hasn't changed, docker caches these layers and reuses them in subsequent builds. This means that if you modify an instruction in your Dockerfile, Docker will use the cache for all the layers before the modified instruction and rebuild the layers after it. This happens because each layer depends on the layers before it. Modifying an instruction changes the state of the image, and subsequent layers need to be rebuilt to ensure consistency and include the changes.
How to Structure Your Dockerfile The order of instructions in your Dockerfile matters. You should structure your Dockerfile in a way that maximizes cache utilization. Here are a few ways to improve that:
1. Order from least frequently changing to most Place instructions less likely to change at the top of your Dockerfile. For example, if your application dependencies don't change frequently, install them early in the Dockerfile. This way, Docker can reuse the cached layers for these instructions in subsequent builds.
2. Increase cache hits by separating dependencies Let’s consider the following example
FROM node: 14
WORKDIR /app
COPY . .
RUN npm install
RUN npm build
This setup bundles the application code and dependencies, leading to inefficient caching. The COPY . .
command copies the entire project, including node_modules
, if it exists locally. As a result, even if package.json
and package-lock.json
remain unchanged, any modification to the application code forces Docker to re-run npm install
, even though the dependencies haven't changed. If your application code changes frequently but dependencies remain unchanged, this approach unnecessarily rebuilds dependencies with each change. Here’s a more optimized versionFROM node: 14
WORKDIR /app
COPY package.json package-lock.json /app/
RUN npm install
COPY . .
RUN npm build
We copy the package.json
and package-lock.json
files first before copying the entire project directory. This allows Docker to cache the RUN npm install
instruction separately from the application code. If the package.json
and package-lock.json
files haven't changed, Docker can reuse the cached layer for RUN npm install
, even if the application code has changed. The COPY . .
instruction is placed after RUN npm install
, so changes to the application code won't invalidate the cache for the dependency installation step. 3. Group related commands Group related commands together in a single RUN instruction. Each RUN instruction creates a new layer, so combining related commands reduces the number of layers and improves build efficiency. More layers lead to a higher image size and would require a higher number of cache hits. Consider this example
RUN apt-get update
RUN apt-get install -y package1
RUN apt-get install -y package2
RUN apt-get clean
Instead of creating four layers, you can combine these related commands into a single RUN instructionRUN apt-get update && \
apt-get install -y package1 package2 && \
apt-get clean
4. Use multi-stage builds to separate build and runtime environments Multi-stage builds allow you to create smaller and more efficient Docker images by separating the build and runtime environments. This is particularly useful when you have build dependencies that are not needed at runtime. Before multi-stage builds, you might have a Dockerfile like this
FROM node: 14
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
CMD [ "npm" , "start" ]
In this example, the final image includes the entire Node.js build environment, which is unnecessary for running the application. With multi-stage builds, you can split the Dockerfile into multiple stages.# Build stage
FROM node: 14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Runtime stage
FROM node: 14 -alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY package*.json ./
RUN npm ci --only=production
CMD [ "npm" , "start" ]
In this optimized versionThe build
stage uses the full Node.js image to install dependencies and build the application. The runtime
stage uses a minimal Node.js image (node:14-alpine
) and copies only the built files and production dependencies from the build
stage. The resulting image is significantly smaller because it doesn't include the build tools and intermediate artifacts.
5. Use multi-stage builds to parallelize build stages BuildKit is the default Docker backend and it enables parallelizing builds for a multi-stage build. Consider a Dockerfile that builds a Go application with multiple components
# syntax=docker/dockerfile:1.4
FROM golang: 1.16 AS base
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
FROM base AS build-api
COPY api/ /src/api/
RUN go build -o /bin/api /src/api
FROM base AS build-worker
COPY worker/ /src/worker/
RUN go build -o /bin/worker /src/worker
FROM gcr.io/distroless/base-debian11
COPY --from=build-api /bin/api /
COPY --from=build-worker /bin/worker /
In this example,The base
stage sets up the common dependencies by copying go.mod
and go.sum
files and downloading the required packages. The build-api
and build-worker
stages can be executed in parallel as they depend only on the base
stage. Each stage builds a specific component of the application. The final stage uses a minimal base image (gcr.io/distroless/base-debian11
) and copies the compiled binaries from the build-api
and build-worker
stages. With BuildKit, Docker will automatically detect the independent stages and execute them in parallel, speeding up the overall build process.
Minimizing the Docker Image Size Smaller Docker images are faster to build, push, and pull. They also consume less disk space and memory. Here are some strategies to minimize your image size:
Use the right base image: Choose a minimal base image that includes only the necessary dependencies for your application. For example, if you're running a Go application, consider using the official golang:alpine
image instead of the full-fledged golang
image. Alpine Linux images have the additional benefit of generally containing fewer security vulnerabilities due to their reduced attack surface. It only includes essential packages by default, reducing the number of pre-installed components that might not be needed for your specific application. With fewer packages and a smaller codebase, there's less surface area for potential vulnerabilities.Combine RUN commands: As we mentioned before, each RUN instruction in your Dockerfile creates a new layer. More layers result in more overhead and a larger image size. Combine multiple RUN instructions into a single instruction to reduce the number of layers.Clean up after package installations: When installing packages in your Dockerfile, it's useful to clean up the package cache and temporary files in the same RUN
instruction. This practice helps reduce the size of your final Docker image by removing unnecessary files that are not needed at runtime. Consider the following exampleFROM ubuntu: 20.04
RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/*
In this Dockerfile, we update the package lists using apt-get update
and install the curl
package using apt-get install
. However, after the installation, we remove the package lists stored in /var/lib/apt/lists/
using rm -rf /var/lib/apt/lists/*
. This cleanup step removes the cached package lists, which are not required once the packages are installed. Removing the package cache and temporary files prevents them from being included in the final image layer. This can significantly reduce the image size, especially when installing multiple packages or dealing with large package repositories. The multi-stage builds solution we mentioned before circumvents this problem entirely by copying only what’s required at run time instead of build time.Use the Right File-Copying Strategy When copying files into your Docker image, use the appropriate instructions
COPY vs. ADD: Use COPY
for most cases, as it’s straightforward and only copies files from the host to the image. ADD
can perform extra tasks such as extracting a local tar file directly into the container’s filesystem or reading from remote URLs. This can result in unwanted artifacts ending up in your Docker image.Use .dockerignore: Create a .dockerignore file in your build context to exclude unnecessary files and directories from being copied into the image. This reduces the build context size, speeds up the build process, and keeps your images lean. Some common files to ignore are:Version Control Directories : Exclude directories such as .git
, .svn
, or .hg
that contain version control metadata.Build Artifacts : Exclude files and directories generated by build processes, such as node_modules
, dist
, build
, or target
directories.Temporary Files : Exclude temporary files and directories like .tmp
, .log
, or .swp
.Configuration Files : Exclude files that are not needed in the production environment, such as .env
files containing development configurations or secrets.Documentation : Exclude documentation files like README.md
, unless they are needed in the image.To summarize, the order of your Dockerfile can drastically improve your cache efficiency. Your Docker image only needs to contain runtime dependencies, not build dependencies. Having a smaller image size can also help you pull/push faster and contribute to improving developer productivity.