How to build a Docker container for beginners

Docker is a software platform that allows you to quickly build, test, and deploy applications. Docker packs software into standardized units called containers that contain everything the software needs to run, including libraries, system tools, code, and runtime…

It has been a few years now from the container explosion and the ecosystem is starting to settle down. After all the big names in industry jumped in the containerization wagon, very often proposing their own solution, it seems like Docker based platforms are finally here to stay.

After Docker became the de facto standard, many things have evolved, most often under the hood. The Open Container Iniciative (OCI) was created with the objective of standardizing run times and image formats and Docker changed its internal plumbing to accommodate, such as implementing runC, and containerd-shim.

Now, after years of Docker freezing its user facing APIs we are starting to see movement again on this front, and many of the common practices we are used to seeing when building and running containers might have better alternatives. Other things have just been deprecated or fallen out of favor.

Buildkit is here

Now Dockerfiles can be built with the Buildkit backend. This backend is smarter than the classic one, and is able to speed up the build process by achieving smarter caching, parallelizing build steps and also it accepting new Dockerfile options. To quote the official documentation

Starting with version 18.09, Docker supports a new backend for executing your builds that is provided by the moby/buildkit project. The BuildKit backend provides many benefits compared to the old implementation. For example, BuildKit can: Detect and skip executing unused build stages Parallelize building independent build stages Incrementally transfer only the changed files in your build context between builds Detect and skip transferring unused files in your build context Use external Dockerfile implementations with many new features Avoid side-effects with rest of the API (intermediate images and containers) Prioritize your build cache for automatic pruning.
In order to select this backend, we export the environment variable DOCKER_BUILDKIT=1. If you miss the detailed output, just add --progress=plain.

Use multi-stage builds

We are used to seeing a lot of trickery to try to keep each layer size to a minimum. Very often we use a RUN statement to download the source and build an artifact, for instance a .deb file or to compile a binary, and try to clean it up in the same statement to keep the layer tidy.

RUN \
    apt-get update ; \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y git make gcc libc-dev ; \
    git clone https://github.com/some_repo ; \
    cd some_repo ; \
    ./configure ; \
    make ; \
    make install; \
    cd -; \
    rm -rf some_repo; \
    apt-get purge -y make gcc libc-dev git; \
    apt-get autoremove -y; \
    apt-get clean; \
    find /var/lib/apt/lists -type f | xargs rm; \
    find /var/log -type f -exec rm {} \;; \
    rm -rf /usr/share/man/*; \
    rm -rf /usr/share/doc/*; \
    rm -f /var/log/alternatives.log /var/log/apt/*; \
    rm /var/cache/debconf/*-old

This both wastes time each build and is ugly for what it does. It is better to use multi-stage builds.

FROM debian:stretch-slim AS some_bin
 
RUN apt-get update; \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-    recommends git make gcc libc-dev; \
    git clone https://github.com/some_repo; \
    cd some_repo; \
    ./configure; \
    make
 
FROM debian:stretch-slim
 
COPY --from=some_bin /root/some_repo/some_bin /usr/local/bin/

In some cases, in order to avoid this I have seen people adding a binary blob in the git repository so it can be copied. Much better to use this approach.

Use –squash

While it doesn’t always makes sense for every Dockerfile, we can often squash our first layer and achieve a more maintainable Dockerfile.

--squash is an experimental argument that needs to be enabled like we explained here. Instead of

RUN apt-get update; \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y mariadb; \
    apt-get autoremove -y; \
    apt-get clean; \
    find /var/lib/apt/lists -type f | xargs rm; \
    find /var/log -type f -exec rm {} \;; \
    rm -rf /usr/share/man/*; \
    rm -rf /usr/share/doc/*; \
    rm -f /var/log/alternatives.log /var/log/apt/*; \
    rm /var/cache/debconf/*-old

, we can do

RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y mariadb
RUN apt-get autoremove -y
RUN apt-get clean
RUN find /var/lib/apt/lists -type f | xargs rm
RUN find /var/log -type f -exec rm {} \;
RUN rm -rf /usr/share/man/*
RUN rm -rf /usr/share/doc/*
RUN rm -f /var/log/alternatives.log /var/log/apt/*
RUN rm /var/cache/debconf/*-old

Use one Dockerfile for all architectures

We talked about a technique to use `qemu-user-static` for creating builds for different architectures in a past article. When building different versions of the same image for different architectures, it is very common to see three exact copies of the same Dockerfile where the only thing that changes is the `FROM` line.

FROM debian:stretch-slim
# the rest of the file is duplicated

FROM armhf/debian:stretch-slim
# the rest of the file is duplicated

FROM arm64v8/debian:stretch-slim
# the rest of the file is duplicated

Just have one single file like so

ARG arch
FROM ${arch}/debian:stretch-slim

, and build with

docker build . --build-arg arch=amd64
docker build . --build-arg arch=armhf
docker build . --build-arg arch=arm64v8

Use multi-arch manifests

So we now have three builds with one Dockerfile, and we tag them like this

docker build . --build-arg arch=amd64   -t ownyourbits/example-x86
docker build . --build-arg arch=armhf   -t ownyourbits/example-armhf
docker build . --build-arg arch=arm64v8 -t ownyourbits/example-arm64

That’s all good, but we can now simplify the instructions the user has to type by creating a multi-arch manifest.

This is still an experimental CLI feature, so we need to export DOCKER<em>CLI</em>EXPERIMENTAL=enabled to be able to access it.

export DOCKER_CLI_EXPERIMENTAL=enabled
docker manifest create --amend ownyourbits/example \
  ownyourbits/example-x86 \
  ownyourbits/example-armhf \
  ownyourbits/example-arm64
 
docker manifest annotate ownyourbits/example ownyourbits/example-x86   --os linux --arch amd64
docker manifest annotate ownyourbits/example ownyourbits/example-armhf --os linux --arch arm
docker manifest annotate ownyourbits/example ownyourbits/nextcloudpi-arm64 --os linux --arch arm64v8
 
docker manifest push -p ownyourbits/example

Now your users of any architecture only have to do

docker pull ownyourbits/example

, and they will receive the correct version of the image.

Add a HEALTHCHECK

Even if we run with --restart=unless-stopped, the only clue Docker or Docker Swarm to know that things are OK is that the container has not crashed. If it is unresponsive or returning errors it won’t be restarted properly.

It is more robust to add a HEALTHCHECK statement to the Dockerfile

HEALTHCHECK CMD curl --fail http://localhost:8080/status || exit 1

Don’t run as root

Even though containers are theoretically isolated, it is not good security practice to run processes as root inside them in the same way as you don’t run your web server as root.

Towards the end of your build you should add something like

RUN \
    # other stuff
    useradd nonroot
 
USER nonroot

Also, if possible avoid relying on sudo and if you don’t control the Dockerfile at least run in a different user namespace with docker run -u nonroot:nonroot.

Don’t forget to build with –pull

Use docker build --pull in your scripts so you are always on the latest base image.

Don’t use MAINTAINER

MAINTAINER is deprecated. Instead of

MAINTAINER nachoparker (nacho@ownyourbits.com)

, use LABELs instead so they can be inspected just like any other metadata.

LABEL maintainer="nachoparker (nacho@ownyourbits.com)"

Avoid ENV where possible

ENV variables remain in the container at run time and pollute its own environment. Use ARGs instead.

Build with cache mount

Speed up your builds by providing a cache for your package manager, ccache, git and so on. This needs to be enabled with DOCKER<em>CLI</em>EXPERIMENTAL=enabled

# syntax=docker/dockerfile:experimental
 
# FROM and the rest
 
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
    apt-get install -y --no-install-recommends mongodb-server

Use SSH agent

If you require SSH credentials for your build don’t copy ~/.ssh because it will stay in the layer even if you remove it later.

Set up SSH agent, and use the experimental feature for ssh mounts

RUN --mount=type=ssh git clone https://github.com/private_repo/repo.git

Use build secrets

If you need sensitive files that should not be public for your build, use secrets. This way, those files will only be visible to that RUN command during its execution and its contents will disappear without a trace from all layers after that.

RUN --mount=type=secret,id=signing_key,dst=/tmp/sign.cert signing_command

Read the official recommendations again

While I pointed out what for me are the biggest offenders, it is always good to go back and review the officially recommended best practices.

Most of them we are familiar with: write small containers, rearrange layers to take advantage of build cache, don’t add unnecessary packages, but it’s not uncommon to go back to it and discover something we might have missed before.

*Originally published by *nachoparker *at *ownyourbits.com

#docker #web-development