How to building Minimal Docker Containers for Python Applications

How to building Minimal Docker Containers for Python Applications

Build a Docker container for Python applications with Multi-Stage Builds - If your Docker Python build requires system dependencies that are NOT required at runtime, structure your build as follows:

  1. Use a multi-stage build
  2. Stage 1 installs system dependencies and uses them to build local wheels
  3. Stage 2 begins from the same base as Stage 1, copies wheels from Stage 1, and installs the wheels
  4. The rest of your build will be based on Stage 2

If you follow these steps, you'll end up with the smallest-possible Python Docker container with all your Python dependencies intact.

Note: this post references Docker 18.03, Python 3.6, and pip 10. I assume that you are running CPython (Python's reference implementation).

The problem

We want to do a Python system build using Docker. Python system builds often require installing third-party code. This third-party code may contain code or resources that must be compiled during their installation. For simplicity's sake, assume we are talking about source code in the C programming language. Since a Docker container will be our "target machine", we'll need a C compiler in our Docker container. Unfortunately, C compilers are large programs. Since we plan to scale our number of containers up and down based on the demand for its provided service, the image should ideally be as small as possible.

Basically, we want to build C code with a C compiler and then throw away the C compiler to save space in our deployment image.

Examples

The following examples should clarify the problem and its resolution. Note: I'm assuming that you're using a POSIX-inspired system.

Setup

Copy the following Makefile into your current working directory.

.PHONY: build-break
build-break:
        docker build -t blog-python:break -f ./Dockerfile.break .

.PHONY: build-big
build-big:
        docker build -t blog-python:big -f ./Dockerfile.big .
        docker images

.PHONY: build-uninstall-big
build-uninstall-big:
        docker build -t blog-python:big-uninstall -f ./Dockerfile.uninstall .
        docker images

.PHONY: build-small
build-small:
        docker build -t blog-python:small -f ./Dockerfile.small .
        docker images

Example 1: broken build requiring a C compiler

We have a simple, entrypoint-less Docker container in which we must install uWSGI. In the uWSGI quickstart guide, its developers clarify that it "is a (big) C application, so you need a C compiler (like gcc or clang) and the Python development headers".

Copy the following code into a file called "Dockerfile.break":

FROM python:3.6-alpine as breakimage

RUN pip install uwsgi

Now run the following shell command in the same directory as your Dockerfile.break.

make build-break

At the end of our failed build, we see this Traceback (in addition to other helpful messages):

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-tkd8plx9/uwsgi/setup.py", line 137, in <module>
    'Programming Language :: Python :: 3.6',
  File "/usr/local/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/usr/local/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/local/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/usr/local/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/tmp/pip-install-tkd8plx9/uwsgi/setup.py", line 77, in run
    conf = uc.uConf(get_profile())
  File "/tmp/pip-install-tkd8plx9/uwsgi/uwsgiconfig.py", line 747, in __init__
    raise Exception("you need a C compiler to build uWSGI")
Exception: you need a C compiler to build uWSGI

Consistent with the uWSGI documentation, our system has said that we "need a C compiler to build uWSGI". We'll do that in example 2.

Example 2: large build with C compiler installed

In this example, we'll install our system dependencies so uWSGI can actually be built.

Copy the following code into a file called "Dockerfile.big":

FROM python:3.6-alpine as bigimage

RUN apk add --no-cache linux-headers g++

RUN pip install uwsgi

Now run the following shell command:

make build-big

In the "build-big" make target, I've included a command to list all Docker images on your system. Because of this command, you should see something close to the following in your terminal:

REPOSITORY          TAG                 IMAGE ID            CREATED                  SIZE
blog-python         big                 8a68d0dad407        Less than a second ago   251MB
python              3.6-alpine          8eb1c554687d        16 hours ago             90.4MB

The good

The image built successfully.

The bad

The image is unnecessarily large.

We're planning on scaling our web-service to handle a decent amount of traffic. Scaling will involve deploying many images on many servers. Larger images take longer to deploy and (obviously) take up more space than smaller images.

The ugly

We are including an unnecessary dependency.

We don't need a C compiler in the image, so the C compiler is an unnecessary dependency. Including an unnecessary dependency in our runtime image is a horrible design, similar to including an unnecessary Python dependency in our requirements.txt or setup.py. As great software developers, we HATE bad system design, so let's find a way to resolve the "bad" and the "ugly" while preserving the "good"!

Example 3: failed attempt at simply "uninstalling" C compiler

Unfortunately, if we want to reduce our image size, we cannot simply "uninstall" the C compiler. For reasons that I do not fully comprehend at this time, Docker caches anything you install in an image, so uninstalling a dependency does NOT reduce the image size.

Copy the following code into a file called "Dockerfile.uninstall":

FROM python:3.6-alpine as bigimage-uninstalled

RUN apk add --no-cache linux-headers g++

RUN pip install uwsgi

RUN apk del linux-headers g++

Now run the following shell command:

make build-uninstall-big

You should see something close to the following in your terminal:

REPOSITORY          TAG                 IMAGE ID            CREATED                  SIZE
blog-python         big-uninstall       10a0eb5d42aa        Less than a second ago   251MB
blog-python         big                 8a68d0dad407        11 minutes ago           251MB
python              3.6-alpine          8eb1c554687d        16 hours ago             90.4MB

Our efforts at removing our C compiler proved futile. At this point, lesser developers would give up and assume we've reached the end of the road. But you, dear reader, are reading my blog, and I know you're better than that! Let's dig deeper and find an elegant way shrink our Docker image!

Example 4: small final build without C compiler

This final example results in a small image with uWSGI installed and without a C compiler. It relies heavily on multi-stage builds and on pip wheels.

Copy the following code into a file called "Dockerfile.small":

###########################################
# Throwaway image with C compiler installed
FROM python:3.6-alpine as bigimage

# install the C compiler
RUN apk add --no-cache linux-headers g++

# instead of installing, create a wheel
RUN pip wheel --wheel-dir=/root/wheels uwsgi

###########################################
# Image WITHOUT C compiler but WITH uWSGI
FROM python:3.6-alpine as smallimage

COPY --from=bigimage /root/wheels /root/wheels

# Ignore the Python package index
# and look for archives in
# /root/wheels directory
RUN pip install \
      --no-index \
      --find-links=/root/wheels \
      uwsgi

Now run the following shell command:

make build-small

You should see something close to the following in your terminal:

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
blog-python         small               b952f6280b00        1 second ago        97.4MB
<none>              <none>              91c7bb911f32        3 minutes ago       249MB
blog-python         big-uninstall       10a0eb5d42aa        23 minutes ago      251MB
blog-python         big                 8a68d0dad407        34 minutes ago      251MB
python              3.6-alpine          8eb1c554687d        16 hours ago        90.4MB

Notice that the image tagged "small" is ~61% smaller than its "big" counterparts. It has 7 additional MB from its base alpine container. These megabytes represent only the uWSGI library itself. We'll need to make modifications to uWSGI itself to get any smaller. I leave uWSGI modifications as an exercise for the reader.

Explanation

Two key points are responsible for our Docker build's success:

  1. Reliance on copying between image stages in Docker multi-stage builds. This gets around caching problems with a single image

  2. Understanding the difference between "pip install" and "pip wheel"

Copying betwen Docker build stages in multi-stage build

Unless we explicitly specify a --target, Docker multi-stage builds will tag their last stage. Downstream build stages can reference upstream build stages and copy resources from them, similarly to how resources can be copied from any local or remote file system into a traditional Docker container. Therefore, we "compile" our Python code in one build stage and copy this compiled code in another build stage. Since the code no longer needs to be compiled, we don't need to a C compiler or Linux headers. As the coup de grâce, our build's final stage is not based on any image with a C compiler installed, so this approach completely avoids Docker's caching complexities.

Thanks to Docker's multi-stage builds, we are able to compile our Python package and avoid deploying the build's system dependencies in our final image.

Difference between "pip install" and "pip wheel"

Docker multi-stage builds are cool and all, but I've seen many articles about them. Python's packaging tool, pip, hasn't gotten as much careful attention from the blogging community. Hopefully this section can clear up one common point of confusion: pip install vs pip wheel.

pip install

This is the command most people are familiar with. At a high level, it takes a Python package, runs its setup.py, downloads and installs its dependencies, and potentially does a lot more. Run "pip install" when you want to expand a package's contents and use it as its author intended.

A good mental model: "pip install" takes a consolidated bundle of code / build instructions and places the package's content and dependencies wherever they need to go on an operating system. Once "pip install" runs on our machine, file placement throughout our file system can be pretty hamajang, depending on a package's setup.py instructions.

pip wheel

This tool is mostly used by library developers wanting to distribute their packages in a user-friendly way. For example, scikitlearn, a popular Python library for machine learning, requires a lot of system dependencies to build. Many Python users, especially data scientists, are either unwilling or unable to install these dependencies on their host machines. This user-characteristic led to unfortunate platforms like Anaconda (author opinion). On a more mature note, for those of us with the appropriate dependencies installed, the installation process would often take a very long time; C, FORTRAN, and possibly other languages each needed to be compiled, and installing code written in these languages often leads to a long coffee break.

Wheels enable Python developers to compile a package, and its dependencies, in a distributable form targeting common operating system architectures. Today, most scikitlearn users install it using its wheel, which takes a fraction of the time of the regular build process.

A good mental model: "pip wheel" takes a Python package, makes it ready to be installed on any target machine WITHOUT its build dependencies, and puts it in ONE easily-distributed archive file.

Why we care about this?

Not all Python packages are distributed as wheels. There are some packages, based mostly on C, that are hard to compile once and use in many places. uWSGI appears to be one of those packages. To build our final image, we construct a throw-away container to construct a wheel for uWSGI.

Conclusion

When building a Docker container for a Python application, we can install packages requiring build-time system dependencies AND remove these system dependencies from our final Docker image through a combination of Docker multi-stage builds, pip wheel, and pip install.

python docker

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Docker Explained: Docker Architecture | Docker Registries

Following the second video about Docker basics, in this video, I explain Docker architecture and explain the different building blocks of the docker engine; docker client, API, Docker Daemon. I also explain what a docker registry is and I finish the video with a demo explaining and illustrating how to use Docker hub.

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.

How To Compare Tesla and Ford Company By Using Magic Methods in Python

Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc.. You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like __init__, __call__, __str__ etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).

Live: Python - Docker e Docker Compose - Projeto Flask extensions

Neste episódio colocamos o serviço #python para rodar com #Docker e Docker compose e o próximo passo será a #api Multistreaming with https://restream.io/?ref...

Python Programming: A Beginner’s Guide

Python is an interpreted, high-level, powerful general-purpose programming language. You may ask, Python’s a snake right? and Why is this programming language named after it?