Docker Architecture: The components and processes - Part 1

TL;DR

Learn how the Docker daemon handles image pulls, container creation, networks, and volumes behind the scenes.

Get started!

Docker has emerged as a groundbreaking technology in modern software development, revolutionizing how applications are packaged, deployed, and managed. Docker has introduced containers, enabling developers to create portable, lightweight, and isolated environments, streamlining the entire software development lifecycle. For anyone working with Docker, understanding its underlying architecture is essential for effective troubleshooting, optimizing resource utilization, enhancing security, and orchestrating containerized applications.

Docker comprises three key components:

Docker Client: The primary interface used to interact with Docker. It communicates with the Docker Daemon to manage various Docker objects. When any docker command is used, Docker Client communicates with the Docker Daemon(dockerd) to execute it using Docker API. A single client can interact with multiple Docker daemons.
Docker Daemon: The core engine that manages Docker operations. It resides on the system running the Docker host and can utilize OCI-compliant runtimes like ContainerD and CRI-O for running containers.
Docker Registry: Docker registries serve the purpose of storing and distributing Docker images. Docker Hub is the most common registry used by developers, but you can also host your own private registry. When commands like docker run or docker pull are used, Docker pulls the images from a configured docker registry. Running docker pushwill push the image into the configured docker registry.

Docker architecture overview

When you hear the term “Docker” or “Docker engine”, think of a client-server application. The server in this architecture is the Docker Host, which runs the Docker Daemon. The Docker Daemon does most of the heavy lifting and can be thought of as the brain of the Docker platform. The Docker Daemon is a long-running service that exposes an API. The API is used by the Docker Client, which as the name suggests, is the client side of the Docker engine. Some popular clients are the Docker CLI or Docker Desktop, but a client can also be an application that a developer might build to interact with Docker Daemon directly.

Let us define some more key components that form the basis of Docker’s architecture:

Docker CLI: The command-line interface developers use to interact with the Docker Daemon. It allows users to issue commands like docker run, docker build, and docker pull to manage containers and images.
Docker Objects:
- Docker Images: Read-only templates used to create containers. Images are built from Dockerfiles and can be stored in a Docker registry.
- Docker Containers: Instances of Docker images that are isolated and have their own file system, processes, and network.
- Volumes: Used to persist data generated by and used by Docker containers.
- Networks: Allow containers to communicate and interact with each other and external systems while being isolated.
Docker Registry: A place to store Docker images where you can push and pull images. Docker Hub is the most commonly used public registry, but you can also have private registries.
Dockerfile: A script containing instructions to build a Docker image.

The interaction between these components is very concise. When any docker command is executed, the Docker CLI uses the REST API to communicate with the Docker Daemon. Docker Daemon then processes this request, performing step-by-step actions such as pulling an image from a registry or starting a new container. The client-server model ensures efficient communication and management of the docker resources, whether the client and daemon are on the same host or different systems. The client and daemon communicate over a UNIX socket or a network interface, allowing Dockers diverse integration options.

Docker daemon in detail

The Docker Daemon is responsible for building, running, and distributing Docker containers. The Daemon listens for API requests and then processes certain commands to create or manage containers and images. This functionality allows developers to deploy applications in isolated environments, making it easier to manage dependencies and configurations. The Daemon can also communicate with other Daemons in a distributed system, facilitating orchestration tasks across multiple hosts.

Key components managed by Docker daemon

Containers:

When you run commands in Docker, the Docker Daemon (dockerd) performs a series of internal operations to manage Docker objects such as containers, images, networks, and volumes. For example when you perform

docker run -d --name hello-world

Here is the output of the command

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
478afc919002: Pull complete 
Digest: sha256:53cc4d415d839c98be39331c948609b659ed725170ad2ca8eb36951288f81b75
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

But what exactly is happening internally?

Command reception: The Docker client sends a request to the Docker Daemon via the Docker API to run a new container.
Image check: The Daemon checks if the hello-world image is available locally. If not, it initiates a pull request to the configured Docker registry (default is Docker Hub).
Image pulling: If the image is not found locally, the Daemon communicates with the registry to download the image layers. Each layer is stored in the Docker storage directory (e.g., /var/lib/docker).
Container creation: The Daemon creates a new container instance from the image. It sets up the filesystem, environment variables, and configurations specified in the image.
Network setup: The Daemon assigns a network interface to the container, allowing it to communicate with other containers and the host.
Process initialization: The Daemon starts the container process in a new namespace, isolating it from other processes on the host. This includes setting up PID, network, and IPC namespaces.
Detached mode: Since d is specified, the container runs in detached mode, allowing it to operate in the background.
State management: The Daemon updates the container's state to "running" and monitors its health and resource usage.

Integration of Docker daemon and container runtime

Now that we know how docker daemon runs a container let's take a look into the integration between the Docker Daemon and the container runtime, which is responsible for executing containers. Container runtime is a critical component of Docker architecture, responsible for managing the execution and lifecycle of containers. In Docker, the default container runtime is runc, which operates in conjunction with containerd, an intermediate layer that interfaces with the Docker Daemon (dockerd). When a user issues a command (e.g., docker run), the Docker Daemon processes the request and communicates with containerd to create and manage the container. The containerd layer handles the lower-level operations, such as pulling images and starting containers, while runc is responsible for creating the container's environment and executing its processes. This collaboration ensures that containers are isolated, resource-managed, and run efficiently, utilizing Linux kernel features like namespaces and cgroups.

Additionally, Docker supports alternative container runtimes, enabling users to customize their environment as needed. Overall, this architecture ensures efficient and secure execution of containerized applications while allowing for flexibility in runtime selection.

Images:

Let’s delve into the complete process of building, pulling, and storing images in Docker, focusing on how the Docker Daemon handles these operations when commands are executed. We will cover both the docker build and docker pull commands in detail

Building an image

Let's consider the command:

docker build -t sample_image:latest .

This command builds a Docker image from a Dockerfile located in the current directory (denoted by .) and tags it as sample_image:latest

But what happens internally?

Command Reception: When the docker build command is executed, the Docker client sends a request to the Docker Daemon via the Docker API to initiate the build process.
Dockerfile Parsing: The Daemon retrieves the Dockerfile from the specified context (in this case, the current directory). It parses the Dockerfile to understand the instructions provided, such as FROM, RUN, COPY, and CMD.
Base Image Retrieval: If the Dockerfile specifies a base image (e.g., FROM ubuntu), the Daemon checks if this base image is available locally. If it is not found, the Daemon pulls it from the Docker registry, following the process outlined in the previous section (checking the registry, downloading layers, etc.).
Layer Creation: As the Daemon processes each instruction in the Dockerfile, it creates a new layer for each command. For example:
- RUN: Executes commands in a new layer and commits the changes.
- COPY: Copies files from the host into the image, creating another layer.
- ENV: Sets environment variables, which also results in a new layer.
Layer Storage: Each layer created during the build process is stored in the Docker storage directory (e.g., /var/lib/docker). Docker uses a union filesystem (like OverlayFS) to manage these layers, allowing for efficient storage and retrieval. This approach is necessary because it minimizes duplication of data and enables quick access to the layers that make up an image. By stacking layers, Docker can efficiently manage changes and updates, saving both disk space and time during builds.
Image Metadata Update: After processing all instructions in the Dockerfile, the Daemon updates its internal metadata to reflect the new image, including its layers, tags, and configurations.
Completion Notification: Finally, the Docker Daemon sends a message back to the Docker client, confirming that the image has been successfully built and is available for use.

Pulling an image

Now, let’s revisit the docker pull command:

docker pull ubuntu

When you run this command the output in ther terminal will be similar to this

latest: Pulling from library/ubuntu
9f23a71f1e31: Pull complete 
Digest: sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest

But here is internal process that takes place:

Command reception: When the docker pull ubuntu command is executed, the Docker client sends a request to the Docker Daemon via the Docker API to retrieve the ubuntu image from a Docker registry.
Registry communication: The Daemon checks if the requested image is available locally. If it is not found, it initiates communication with the Docker registry to fetch the image manifest, which contains metadata about the image layers.
Image manifest retrieval: The Daemon receives the image manifest from the registry, detailing the layers that make up the image, including their digests (unique identifiers) and sizes.
Layer downloading: The Daemon starts downloading the required layers of the image one by one. Each layer is stored in the Docker storage directory (e.g., /var/lib/docker). The layers are downloaded using HTTP requests, and the Daemon manages the progress and integrity of each layer by verifying checksums.
Layer storage: Once downloaded, the layers are stored in a layered filesystem. Docker uses a union filesystem to combine these layers into a single image. Each layer is read-only, and changes made by containers based on this image are stored in a new, writable layer on top.
Image metadata update: After all layers are successfully downloaded and stored, the Docker Daemon updates its internal metadata to reflect the newly pulled image, including its name, tags, and layer information.
Completion notification: Finally, the Docker Daemon sends a completion message back to the Docker client, confirming that the ubuntu image has been successfully pulled and is now available for use.

Both the docker build and docker pull commands involve intricate processes managed by the Docker Daemon. When building an image, the Daemon parses the Dockerfile, retrieves any necessary base images, creates layers based on the specified instructions, and updates its metadata. In contrast, when pulling an image, the Daemon communicates with the Docker registry, downloads the required layers, and stores them in the local storage directory.

Networks:

Docker supports several network types, each tailored for specific use cases. The Docker Daemon is responsible for managing these network types and ensuring proper communication between containers and the host system.

Bridge Network

The bridge network is the default network type in Docker. When a container is created without specifying a network, it is automatically attached to the bridge network. The Docker Daemon performs the following steps when creating and managing a bridge network for example when you run a command following thing are conducted by docker daemon:

docker run -d --name web --network bridge nginx

Network Creation: The Daemon creates a Linux bridge interface (e.g., docker0) on the host system.
IP Address Assignment: The Daemon assigns an IP address range (e.g., 172.17.0.0/16) to the bridge interface.
Iptables Configuration: The Daemon sets up iptables rules to enable communication between containers on the bridge network and the host system.
Container Attachment: When a container is created and attached to the bridge network, the Daemon creates a virtual Ethernet pair (veth) interface. One end of the veth pair is attached to the container's network namespace, while the other end is attached to the docker0 bridge on the host.
IP Address Allocation: The Daemon allocates an IP address from the bridge network's IP range and assigns it to the container's network interface.
DNS Configuration: The Daemon configures the container's DNS settings, allowing it to resolve names of other containers on the same network.

The bridge driver is used for creating bridge networks. It creates a Linux bridge interface on the host system and manages the IP address range for the network. The bridge driver is responsible for creating veth pair interfaces for containers and attaching them to the docker0 bridge.

Host Network

‍The host network mode allows a container to use the host system's network stack directly, bypassing Docker's network isolation. The Docker Daemon performs the following steps when running a container in host mode:

Network Namespace: The Daemon does not create a separate network namespace for the container.
Port Mapping: The Daemon does not perform any port mapping, as the container directly uses the host's network interfaces and ports.
Network Configuration: The container inherits the host system's network configuration, including IP addresses, routes, and iptables rules.

The host driver is used for running containers in host network mode. It does not create any additional network interfaces or perform any network isolation, as the container directly uses the host system's network stack.

Overlay Network

‍The overlay network is used for multi-host networking in Docker Swarm. Docker Swarm is a built-in clustering and orchestration tool for managing multiple Docker hosts. It enables you to group several Docker Engines into a single virtual entity, allowing containers to communicate across different hosts while ensuring high availability and load balancing. This is very similar to Kubernetes. However, the fundamental difference between Docker Swarm and Kubernetes is the architecture and scalability.

Docker Swarm uses a simpler architecture and scaling method, relying on a swarm of nodes that can be easily added or removed, making it straightforward to set up and manage. This simplicity makes Docker Swarm an ideal choice for smaller projects and teams. In contrast, Kubernetes employs a more complex architecture that offers advanced features such as pod replication, self-healing, and automatic rollouts and rollbacks. It is better suited for larger and more complex applications that require the management of stateful workloads, like databases, which need unique network configurations.

An overlay network allows containers on different Docker hosts to communicate securely. The Docker Daemon performs the following steps when creating and managing an overlay network:

Network Creation: The Daemon creates an overlay network on the Swarm manager node.Encryption: The Daemon enables encryption for the overlay network traffic using an encryption key.
Service Discovery: The Daemon configures an embedded DNS server for service discovery, allowing containers to resolve service names across the Swarm.
Routing Mesh: The Daemon sets up a routing mesh that enables load balancing for services published on the overlay network.
Container Attachment: When a container is created and attached to the overlay network, the Daemon creates a veth pair interface and attaches it to the container's network namespace and the overlay network on the host.

The overlay driver is used for creating overlay networks in Docker Swarm. It enables secure communication between containers on different Docker hosts by creating an encrypted overlay network. The overlay driver also configures the embedded DNS server for service discovery and sets up the routing mesh for load balancing.

Integration of Docker containers with existing network infrastructure

Apart from these drivers docker daemon also uses MACVLAN Driver which allows containers to have their own MAC addresses, making them appear as physical network devices on the host system's network. This driver is useful when you need to integrate Docker containers with existing network infrastructure that relies on MAC addresses for routing or security policies. For example, if you have a legacy application that requires specific network configurations or firewall rules based on MAC addresses, using MACVLAN your containers can connect to the network seamlessly, just like any physical device. This capability can simplify the integration of containerized applications into traditional network setups.

Volumes:

To understand how Docker manages persistent storage through data volumes and bind mounts, we need to explore the role of the Docker Daemon ( dockerd) in handling these storage options. This includes the internal processes that occur when commands related to persistent storage are executed. Docker containers are ephemeral by nature, meaning that any data stored within a container's writable layer is lost when the container is removed. To address this, Docker provides two primary mechanisms for persistent storage: data volumes and bind mounts. Choosing between data volumes and bind mounts depends on your specific needs. Data volumes can be used for better management and isolation, and bind mounts can be used for direct access to the host filesystem.

Data volumes

These are stored in a part of the host filesystem managed by Docker, typically under /var/lib/docker/volumes/. They are designed to persist data beyond the lifecycle of a single container and can be shared among multiple containers. The Docker Daemon manages these volumes, ensuring data integrity and isolation.

Creating a data volume

When you execute the following command:

docker volume create my_volume

Here are the things that happen internally

Command reception: The Docker client sends the request to the Docker Daemon to create a new volume.
Volume creation: The Daemon creates a new directory for the volume under /var/lib/docker/volumes/my_volume/.
Metadata update: The Daemon updates its internal metadata to include the new volume, making it available for use by containers.
Completion notification: The Daemon sends a confirmation back to the client that the volume has been created successfully.

Using a Data Volume

To use the created volume in a container, you can run:

docker run -d --name my_container -v my_volume:/data alpine

Here is what happens internally

Command reception: The Docker client sends a request to run a new container with the specified volume.
Volume attachment: The Daemon checks if my_volume exists and prepares to mount it to the /data directory in the container.
Container creation: The Daemon creates the container, setting up the necessary filesystem and network configurations.
Volume mounting: The Daemon mounts the volume to the specified path inside the container, allowing the container to read from and write to the volume.
State management: The Daemon updates the container's state to "running" and monitors its resource usage.

Bind mounts

You can specify a directory on the host filesystem that is mounted into a container. This gives you direct access to the host's filesystem from within the container. Unlike volumes, bind mounts can be located anywhere on the host filesystem.

Using a Bind Mount

To run a container with a bind mount, you can execute:

docker run -d --name my_container -v /path/on/host:/data alpine

Here are the things that happen internally

Command reception: The Docker client sends a request to run a new container with the specified bind mount.
Path Validation: The Daemon verifies that the specified path (/path/on/host) exists on the host filesystem.
Container Creation: The Daemon creates the container, setting up the necessary filesystem and network configurations.
Bind Mounting: The Daemon mounts the host directory to the specified path inside the container. Any changes made in that directory inside the container will reflect on the host and vice versa.
State Management: The Daemon updates the container's state to "running" and monitors its resource usage.

Now that we understand the internal processes involved in using bind mounts, it's important to note that Docker recommends using data volumes over bind mounts for several reasons like:

Data Management: Data volumes are managed by Docker, which simplifies the management process. They are stored in a specific location on the host filesystem (typically under /var/lib/docker/volumes/), allowing Docker to handle the lifecycle and cleanup of volumes automatically. This reduces the risk of orphaned data and makes it easier to manage storage.
Isolation: Data volumes provide better isolation from the host filesystem. Since they are not directly tied to a specific directory on the host, they minimize the risk of accidental modification or deletion of important files on the host, enhancing data integrity and security.
Performance: Data volumes often provide better performance compared to bind mounts. They are optimized for Docker's storage drivers, which can lead to improved read and write speeds, especially when dealing with large amounts of data or high I/O operations.
Sharing Data: Data volumes can be easily shared among multiple containers, allowing for seamless data access and collaboration between different services. This is particularly useful in microservices architectures where multiple containers need to access the same data.
Backup and Restore: Backing up and restoring data volumes is straightforward with Docker commands. You can easily create backups of volumes without needing to worry about the underlying host filesystem structure, making data recovery simpler and more efficient.
Portability: Data volumes can be easily moved or copied between different Docker hosts using Docker commands, making it easier to migrate applications and their data across environments.

Conclusion

We examined how the Docker Daemon interacts with various components, including the Docker Client and Docker Registry, to facilitate seamless operations such as building images, pulling images from registries, and managing container lifecycles. Additionally, we discussed the different types of Docker networks and networking drivers, highlighting how they enable effective communication between containers and the host system. Understanding these foundational concepts is essential for optimizing resource utilization, enhancing security, and orchestrating containerized applications effectively. This knowledge sets the stage for further exploration into more advanced topics.

In Part 2 of this series, we will delve into Communication and Security as well as Monitoring and Troubleshooting, along with extending the capabilities of the Docker Daemon. These topics will provide you with practical insights and techniques to enhance your Docker management skills and optimize the performance of your containerized applications.