1. Introduction

Docker is a framework to isolate the software from the infrastructure on which it is running. As in the case of Java, the core philosophy was to separate the code from the system-level implementation, the Docker employs the same principle. The Docker uses a container that provides everything needed to run a piece of software.

In Java, the distributing component is code in an intermediate Bytecode form. The same Bytecode runs on every machine. In Docker, the distributing component is an image. The image packages the software along with its every requirements.

Docker architecture: Docker consist of the following three components.

Docker client (docker): Its a command line tool that lets talk with the Docker daemon, which manages the actual images. The client can communicate with multiple daemons.
Docker daemon (dockerd): The Docker daemon builds, runs, and distributes the images.
Docker registry: It’s a storage hub for Docker images. Docker by default subscribes to a public registry Docker Hub. One can configure a private registry for the Docker to use. When Docker runs or pulls, it Docker pulls the required images from the configured registry. When Docker pushed, Docker pushes the image to the registry.

The components work together in a client-server architecture. This means that the dlient and daemon can either be on the same system or in separate systems. They communicate using REST API.

Docker objects:

images
containers
volume
plugin
docker image ls: Get the list of all the images in Docker.
docker builder prune: Run in the project root to remove all the docker traces of the project.
docker system df: Get the storage usage of Docker.
docker stop <CONTAINER-ID>: Stop a container using its ID. It sends SIGTERM signal to the main process in the container.
docker rm -f <CONTAINER-ID>: Kills a container using its ID. It sends SIGKILL comand to the main process and kills it immediately.
docker ps -a: Show all the running containers including the hidden ones.

2. Image

An image is the packaged module that wraps the app along with its all the dependencies. It is a physical copy of the app and the instructions to build the app. An image consists of layers. Each layer correspond to the line of instruction in a Dockerfile. The Dockerfile laid out the whole instruction to build the image for the app. The image is immutable that means once an image is created it cannot be modified. However, one can add new layers on top of it. This allows to use an existing image as a base to build another image on top of it. For ex, one can start with an image for python and build another image by adding more layers required for a specific app.

One can find images directly on Docker Hub, or using CLI as docker search docker/<image-name>. The command returns a list of images with their NAME, DESCRIPTION, STARS, and OFFICIAL. Once the image you are looking for is found, you can pull the image into your Docker using docker pull docker/<image-name>.

3. Creating Image

An image is created using the Dockerfile. It lays the instructions to build the image. Each instruction in the Dockerfile follows the syntax <instruction> <arguments>. Instructions are case-insensitive, however, it is a common practice to use UPPERCASE for instructions.

As we discussed above, the image consists of layers. So, ow does exactly the layers map to the instructions in the Dockerfile? There are a set of instructions that create new layers. The layers created by these instructions are immutable. Instructions that create new layers are those that add new content to the image, such as files and programs. Some of those instructions are FROM, RUN, COPY, WORKDIR. While instructions that don’t create any layers are EXPOSE, ENV, CMD, ENTRYPOINT. These instructions add metadata.

The Dockerfile is essentially a set of linux instructions laid out to systematize and automate the app building process. Every build in docker starts by extending a pre-built image. The pre-built image serves as a base upon which rest of the build layers are stacked. In Dockerfile this is done using the FROM instruction at the beginning of the file.

FROM image-name:version as base

The FROM pulls the specified image and treats it as the base layer to built the rest of the app. The base image can be python images such as python:3.12 for python projects. If the image is being built from scratch, a scratch base can be used that is essentially an empty layer.

A basic day-to-day routine of a Dockerfile goes like this.

Pull an image and make it the first layer of the build stage. This builds the linux filesystem inside for the container in which the packages of the image are stored at the appropriate locations.
Set the working directory in the container for the run commands.
Once the base filesystem is created run the commands to install the dependencies of the app.
Copy the app’s source code form the host to the image.
Specify the commands to run when starting the container.

We illustrate the process by creating an image for Airflow running the following dag.

from datetime import datetime
from airflow.sdk import dag, task

args = {"dag_id": "Sample_DAG", "start_date": datetime(2026, 5, 4), "schedule": "@daily"}

@dag(**args)
def sample_dag():
    @task()
    def task1():
        print("Hey! This is running in task 1")
        return 120

    @task()
    def task2(a):
        print(f"Hey! This is running in taks 2. Value passed: {a}")

    t1 = task1()
    t2 = task2(t1)

d = sample_dag()

The image consists of the Airflow running the above dag. The Dockerfile for building the image is the following.

# syntax=docker/dockerfile:1

ARG PYTHON_BASE=3.12
FROM python:${PYTHON_BASE}-slim

# The env vars map directly to the keys in airflow.cfg as AIRFLOW__<SECTION>__<KEY>
ENV AIRFLOW_HOME=/app/airflow
ENV AIRFLOW__CORE__DAGS_FOLDER=/app/dags
ENV AIRFLOW__LOGGING__BASE_LOG_FOLDER=/app/log
ENV AIRFLOW__CORE__LOAD_EXAMPLES=False
ENV AIRFLOW__CORE__AUTH_MANAGER=airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager

WORKDIR /app/

ARG AIRFLOW_VERSION=3.2.1
ARG PYTHON_BASE=3.12
ARG CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_BASE}.txt"

RUN python -m pip install "apache-airflow[fab]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

COPY . .

RUN chmod +x /app/airflow.sh

ENTRYPOINT ["/app/airflow.sh"]

Only ARG can appear before the first FROM command. The scope of the ARG declared before the FROM is upto the FROM line.
The ENV instruction declares the environment variables that would be used to configure the airflow installation.
WORKDIR sets the working directory in the container image where all the following run commands would be executed.
The RUN command executes the airflow installation in the working directory.
COPY copies all the files in the project root to the working directory fo the container image.
The ENTRYPOINT declares the command to be executed when the container is created from the image. Here, it executes the airflow.sh script.
The airflow.sh file starts the processes and creates the user with the credentials passed as environment variables.

#!/bin/bash
airflow db migrate
airflow users create \
--username ${USERNAME} \
--firstname ${FIRSTNAME:-Admin} \
--lastname ${LASTNAME:-user} \
--role Admin \
--email ${EMAIL} \
--password ${PASSWORD}
airflow dag-processor &
airflow triggerer &
airflow scheduler &
airflow api-server --port 8080

To build the image from the Dockerfile the build command is run in the project root.

docker build -t my_airflow .

The . at the end of the line tells the docker to run the build command in the current folder, which is the project root. The built image can be checked using th ecommand docker image ls.

IMAGE	ID	DISK USAGE	CONTENT SIZE	EXTRA
my_airflow:latest	e874eea163f4	683MB	185MB

Note that the user is not created yet. The build command only builts the image with all the components of the project in place. The commands in the airflow.sh script are executed when a container is created using the image. To create the container run the following command.

docker run -p 8080:8080 --env-file /path/to/.env --name sample_airflow my_airflow

The my_airflow is the image tag. To give a name to the container use the --name to specify the name. The -p 8080:8080 maps the container’s port to the host’s port. The .env file is passed while creating the container that contains the environment variables used in the airflow.sh script.

To open a shell inside the container use the following command docker exec -it $(docker ps -a | awk '{if(NR>1){print $1}}') /bin/bash. The $(docker ps -a | awk '{if(NR>1){print $1}}') command fetches the container ID, which in the present case is the first row in the CONTAINER ID column.

To kill the container use the command docker rm -f $(docker ps -a | awk '{if(NR>1){print $1}}').

4. Multi-stage Build

In the above example we created a base layer with python image and built the airflow layer on top of it. Airflow comes with it’s own set of configration that can be changed during runtime. Some of the configuration we saw above such as AIRFLOW__CORE__DAGS_FOLDER, AIRFLOW__LOGGING__BASE_LOG_FOLDER, AIRFLOW__CORE__LOAD_EXAMPLES, and AIRFLOW__CORE__AUTH_MANAGER can be changed during runtime.

It is useful to separate the raw airflow image on top of which we can build further project’s layers. Docker provides multi-stage build for such purposes. The multi-stage Dockerfile consists of more than one FROM instruction. In the below example we upgrade the previous Dockerfile to multi-stage build. The first block builts a raw airflow image which is reused in the second block. The second block adds airflow providers that are required for specific applications and the project’s dependencies.

In this case, we can isolate the first image block in a separate Dockerfile and built a raw airflow-base image. We can then reuse the same image in multiple projects with different requirements.

# syntax=docker/dockerfile:1

ARG PYTHON_BASE
FROM python:${PYTHON_BASE}-slim AS airflow-base

ARG PYTHON_BASE
ARG AIRFLOW_VERSION

ENV PYTHON_BASE=${PYTHON_BASE}
ENV AIRFLOW_VERSION=${AIRFLOW_VERSION}
ENV AIRFLOW_HOME=/opt/airflow
WORKDIR ${AIRFLOW_HOME}

ARG CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_BASE}.txt"

RUN python -m pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

ENV AIRFLOW_CONSTRAINT_URL=${CONSTRAINT_URL}

# Build the project image using the above image as base layer.
FROM airflow-base

ARG AIRFLOW_VERSION=${AIRFLOW_VERSION}
ARG CONSTRAINT_URL=${AIRFLOW_CONSTRAINT_URL}

RUN python -m pip install "apache-airflow-providers-fab" --constraint "${CONSTRAINT_URL}"
RUN python -m pip install "apache-airflow-providers-postgres" --constraint "${CONSTRAINT_URL}"
RUN python -m pip install psycopg2-binary

WORKDIR /app

COPY . .

RUN chmod +x ./airflow/airflow.sh

ENTRYPOINT ["./airflow.sh"]

1. Introduction

2. Image

3. Creating Image

4. Multi-stage Build

4. Docker Compose