Docker has emerged as a groundbreaking technology in modern software development, revolutionizing how applications are packaged, deployed, and managed. Docker has introduced containers, enabling developers to create portable, lightweight, and isolated environments, streamlining the entire software development lifecycle. For anyone working with Docker, understanding its underlying architecture is essential for effective troubleshooting, optimizing resource utilization, enhancing security, and orchestrating containerized applications.
Docker comprises three key components:
docker run
or docker pull
are used, Docker pulls the images from a configured docker registry. Running docker push
will push the image into the configured docker registry.When you hear the term “Docker” or “Docker engine”, think of a client-server application. The server in this architecture is the Docker Host
, which runs the Docker Daemon
. The Docker Daemon does most of the heavy lifting and can be thought of as the brain of the Docker platform. The Docker Daemon is a long-running service that exposes an API. The API is used by the Docker Client
, which as the name suggests, is the client side of the Docker engine. Some popular clients are the Docker CLI or Docker Desktop, but a client can also be an application that a developer might build to interact with Docker Daemon directly.
Let us define some more key components that form the basis of Docker’s architecture:
docker run
, docker build
, and docker pull
to manage containers and images.The interaction between these components is very concise. When any docker command is executed, the Docker CLI uses the REST API to communicate with the Docker Daemon. Docker Daemon then processes this request, performing step-by-step actions such as pulling an image from a registry or starting a new container. The client-server model ensures efficient communication and management of the docker resources, whether the client and daemon are on the same host or different systems. The client and daemon communicate over a UNIX socket or a network interface, allowing Dockers diverse integration options.
The Docker Daemon is responsible for building, running, and distributing Docker containers. The Daemon listens for API requests and then processes certain commands to create or manage containers and images. This functionality allows developers to deploy applications in isolated environments, making it easier to manage dependencies and configurations. The Daemon can also communicate with other Daemons in a distributed system, facilitating orchestration tasks across multiple hosts.
When you run commands in Docker, the Docker Daemon (dockerd
) performs a series of internal operations to manage Docker objects such as containers, images, networks, and volumes. For example when you perform
docker run -d --name hello-world
Here is the output of the command
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
478afc919002: Pull complete
Digest: sha256:53cc4d415d839c98be39331c948609b659ed725170ad2ca8eb36951288f81b75
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(arm64v8)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
But what exactly is happening internally?
hello-world
image is available locally. If not, it initiates a pull request to the configured Docker registry (default is Docker Hub)./var/lib/docker
).d
is specified, the container runs in detached mode, allowing it to operate in the background.Now that we know how docker daemon runs a container let's take a look into the integration between the Docker Daemon and the container runtime, which is responsible for executing containers. Container runtime is a critical component of Docker architecture, responsible for managing the execution and lifecycle of containers. In Docker, the default container runtime is runc
, which operates in conjunction with containerd
, an intermediate layer that interfaces with the Docker Daemon (dockerd
). When a user issues a command (e.g., docker run
), the Docker Daemon processes the request and communicates with containerd
to create and manage the container. The containerd
layer handles the lower-level operations, such as pulling images and starting containers, while runc
is responsible for creating the container's environment and executing its processes. This collaboration ensures that containers are isolated, resource-managed, and run efficiently, utilizing Linux kernel features like namespaces and cgroups.
Additionally, Docker supports alternative container runtimes, enabling users to customize their environment as needed. Overall, this architecture ensures efficient and secure execution of containerized applications while allowing for flexibility in runtime selection.
Let’s delve into the complete process of building, pulling, and storing images in Docker, focusing on how the Docker Daemon handles these operations when commands are executed. We will cover both the docker build
and docker pull
commands in detail
Building an image
Let's consider the command:
docker build -t sample_image:latest .
This command builds a Docker image from a Dockerfile
located in the current directory (denoted by .
) and tags it as sample_image:latest
But what happens internally?
docker build
command is executed, the Docker client sends a request to the Docker Daemon via the Docker API to initiate the build process.Dockerfile
from the specified context (in this case, the current directory). It parses the Dockerfile
to understand the instructions provided, such as FROM
, RUN
, COPY
, and CMD
.Dockerfile
specifies a base image (e.g., FROM ubuntu
), the Daemon checks if this base image is available locally. If it is not found, the Daemon pulls it from the Docker registry, following the process outlined in the previous section (checking the registry, downloading layers, etc.).Dockerfile
, it creates a new layer for each command. For example:RUN
: Executes commands in a new layer and commits the changes.COPY
: Copies files from the host into the image, creating another layer.ENV
: Sets environment variables, which also results in a new layer./var/lib/docker
). Docker uses a union filesystem (like OverlayFS) to manage these layers, allowing for efficient storage and retrieval. This approach is necessary because it minimizes duplication of data and enables quick access to the layers that make up an image. By stacking layers, Docker can efficiently manage changes and updates, saving both disk space and time during builds.Dockerfile
, the Daemon updates its internal metadata to reflect the new image, including its layers, tags, and configurations.Pulling an image
Now, let’s revisit the docker pull
command:
docker pull ubuntu
When you run this command the output in ther terminal will be similar to this
latest: Pulling from library/ubuntu
9f23a71f1e31: Pull complete
Digest: sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest
But here is internal process that takes place:
docker pull ubuntu
command is executed, the Docker client sends a request to the Docker Daemon via the Docker API to retrieve the ubuntu
image from a Docker registry./var/lib/docker
). The layers are downloaded using HTTP requests, and the Daemon manages the progress and integrity of each layer by verifying checksums.ubuntu
image has been successfully pulled and is now available for use.Both the docker build
and docker pull
commands involve intricate processes managed by the Docker Daemon. When building an image, the Daemon parses the Dockerfile
, retrieves any necessary base images, creates layers based on the specified instructions, and updates its metadata. In contrast, when pulling an image, the Daemon communicates with the Docker registry, downloads the required layers, and stores them in the local storage directory.
Docker supports several network types, each tailored for specific use cases. The Docker Daemon is responsible for managing these network types and ensuring proper communication between containers and the host system.
Bridge Network
The bridge
network is the default network type in Docker. When a container is created without specifying a network, it is automatically attached to the bridge
network. The Docker Daemon performs the following steps when creating and managing a bridge network for example when you run a command following thing are conducted by docker daemon:
docker run -d --name web --network bridge nginx
docker0
) on the host system.172.17.0.0/16
) to the bridge interface.bridge
network and the host system.bridge
network, the Daemon creates a virtual Ethernet pair (veth) interface. One end of the veth pair is attached to the container's network namespace, while the other end is attached to the docker0
bridge on the host.bridge
network's IP range and assigns it to the container's network interface.The bridge driver is used for creating bridge networks. It creates a Linux bridge interface on the host system and manages the IP address range for the network. The bridge driver is responsible for creating veth pair interfaces for containers and attaching them to the docker0 bridge.
Host Network
The host network mode allows a container to use the host system's network stack directly, bypassing Docker's network isolation. The Docker Daemon performs the following steps when running a container in host mode:
The host driver is used for running containers in host network mode. It does not create any additional network interfaces or perform any network isolation, as the container directly uses the host system's network stack.
Overlay Network
The overlay network is used for multi-host networking in Docker Swarm. Docker Swarm is a built-in clustering and orchestration tool for managing multiple Docker hosts. It enables you to group several Docker Engines into a single virtual entity, allowing containers to communicate across different hosts while ensuring high availability and load balancing. This is very similar to Kubernetes. However, the fundamental difference between Docker Swarm and Kubernetes is the architecture and scalability.
Docker Swarm uses a simpler architecture and scaling method, relying on a swarm of nodes that can be easily added or removed, making it straightforward to set up and manage. This simplicity makes Docker Swarm an ideal choice for smaller projects and teams. In contrast, Kubernetes employs a more complex architecture that offers advanced features such as pod replication, self-healing, and automatic rollouts and rollbacks. It is better suited for larger and more complex applications that require the management of stateful workloads, like databases, which need unique network configurations.
An overlay network allows containers on different Docker hosts to communicate securely. The Docker Daemon performs the following steps when creating and managing an overlay network:
The overlay driver is used for creating overlay networks in Docker Swarm. It enables secure communication between containers on different Docker hosts by creating an encrypted overlay network. The overlay driver also configures the embedded DNS server for service discovery and sets up the routing mesh for load balancing.
Apart from these drivers docker daemon also uses MACVLAN
Driver which allows containers to have their own MAC addresses, making them appear as physical network devices on the host system's network. This driver is useful when you need to integrate Docker containers with existing network infrastructure that relies on MAC addresses for routing or security policies. For example, if you have a legacy application that requires specific network configurations or firewall rules based on MAC addresses, using MACVLAN your containers can connect to the network seamlessly, just like any physical device. This capability can simplify the integration of containerized applications into traditional network setups.
To understand how Docker manages persistent storage through data volumes and bind mounts, we need to explore the role of the Docker Daemon ( dockerd
) in handling these storage options. This includes the internal processes that occur when commands related to persistent storage are executed. Docker containers are ephemeral by nature, meaning that any data stored within a container's writable layer is lost when the container is removed. To address this, Docker provides two primary mechanisms for persistent storage: data volumes and bind mounts. Choosing between data volumes and bind mounts depends on your specific needs. Data volumes can be used for better management and isolation, and bind mounts can be used for direct access to the host filesystem.
Data volumes
These are stored in a part of the host filesystem managed by Docker, typically under /var/lib/docker/volumes/
. They are designed to persist data beyond the lifecycle of a single container and can be shared among multiple containers. The Docker Daemon manages these volumes, ensuring data integrity and isolation.
Creating a data volume
When you execute the following command:
docker volume create my_volume
Here are the things that happen internally
/var/lib/docker/volumes/my_volume/
.Using a Data Volume
To use the created volume in a container, you can run:
docker run -d --name my_container -v my_volume:/data alpine
Here is what happens internally
my_volume
exists and prepares to mount it to the /data
directory in the container.Bind mounts
You can specify a directory on the host filesystem that is mounted into a container. This gives you direct access to the host's filesystem from within the container. Unlike volumes, bind mounts can be located anywhere on the host filesystem.
Using a Bind Mount
To run a container with a bind mount, you can execute:
docker run -d --name my_container -v /path/on/host:/data alpine
Here are the things that happen internally
/path/on/host
) exists on the host filesystem.Now that we understand the internal processes involved in using bind mounts, it's important to note that Docker recommends using data volumes over bind mounts for several reasons like:
/var/lib/docker/volumes/
), allowing Docker to handle the lifecycle and cleanup of volumes automatically. This reduces the risk of orphaned data and makes it easier to manage storage.We examined how the Docker Daemon interacts with various components, including the Docker Client and Docker Registry, to facilitate seamless operations such as building images, pulling images from registries, and managing container lifecycles. Additionally, we discussed the different types of Docker networks and networking drivers, highlighting how they enable effective communication between containers and the host system. Understanding these foundational concepts is essential for optimizing resource utilization, enhancing security, and orchestrating containerized applications effectively. This knowledge sets the stage for further exploration into more advanced topics.
In Part 2 of this series, we will delve into Communication and Security as well as Monitoring and Troubleshooting, along with extending the capabilities of the Docker Daemon. These topics will provide you with practical insights and techniques to enhance your Docker management skills and optimize the performance of your containerized applications.