Dockerfile great practices for Beginners

Dockerfile great practices for Beginners

This is a guide that will ensure your Docker Image will be as small as possible but also ensure it's performant and you understand why you should use certain commands.

What we know about Dockerfile

We know that the Dockerfile is like a recipe file where we can specify things like the OS image to base it on, what libraries should be installed, environment variables, commands we want to run and much more. Everything is there, specified in the file, it's super clear what you are getting. It's a really great advancement from the days where things just worked on our machine or when we spent hours or days installing things - It's progress.

Our Dockerfile sample

We've created a Dockerfile to give you an idea of what it can look like. Let's discuss the various parts of the file to better understand it. Here goes:

// Dockerfile
FROM node:latest

WORKDIR /app

COPY . .

RUN npm install

EXPOSE 3000

ENTRYPOINT ["node", "app.js"]

This is a pretty typical looking file. We select an OS image, set a working directory, copy the files we need, install some libraries, opens up a port and finally runs the applications. So what's wrong with that?

OS image size

At first glance, everything looks the way we expect but at a close look, we can see that we are using node:latest as an image. Let's try to build this into a Docker image with the command:

docker build -t optimize/node .

Ok, let's now run docker images to see our image and get some more stats on it:

It weighs in at 899 MB Ok, we have nothing to compare with but let's change the image to one called node:alpine and rebuild our image:

77.7 MB, WOW!!! That's a huge difference, our Docker image is ten times smaller. Why is that?

This image is based on the Alpine Linux Project in general the Alpine Linux images are much smaller than normal distributions. It comes with some limitations, have a read here. In general it's a safe choice though.

The cache

For every command you specify in the Dockerfile it creates another image layer. What Docker does, however, is to first check the cache to see whether an existing layer can be reused before trying to create one.

When we come to instructions like ADD and COPY we should know how they operate in the context of the cache. For both of these commands, Docker calculates a checksum for each file and stores that in the cache. Upon a new build of the Docker images, each checksum is compared and if it differs, due to a change in the file, it recalculates the checksum and carries out the command. At this point, it creates a new image layer.

Order matters

The way Docker operates is to try to reuse as much as possible. The best thing we can do is to place the instructions, in the Dockerfile, from the least likely to change to the most likely to change.

What does that mean?

Let's look at the top of our Dockerfile:

FROM node:alpine

WORKDIR /app

Here we can see that the FROM command happens first followed by WORKDIR. Both these commands are not likely to change os they are correctly placed at the top.

What is likely to change though?

Well, you are building an application so the source files of your app, or libraries you realize you might suddenly need, like a npm install, makes sense to place as further down in the file.

What do we gain by doing this?

Speed, we gain speed when we build our Docker image and we've placed the commands as efficiently as possible. So in summary ADD, COPY, RUn are commands that should happen later in the Dockerfile.

Minimize the layers

Every command you enter creates a new image layer. Ensure you keep the number of commands to a minimum. Group them if you can. Instead of writing:

RUN command
RUN command2

Organize them like so:

RUN command && \
    command2

Include only what you need

When you build an app. It easily consist of a ton of files but when it comes to what you actually need to create your Docker image it ends up being a smaller number of files. If you create a .dockerignore file you can define patterns that ensure that when we include files, we only get the ones we need, for our container.

Define a start script

Wether you use the command CMD or ENTRYPOINT, you should NOT call the application directly like so node app.js. Instead, try to define a starter script like this npm start.

Why you ask?

We want to make sure we are flexible and unlikely to change this instruction. We might actually end up changing how we start our app by us gradually adding flags to it like so node app.js --env=dev --seed=true. You get the idea, it's a moving target potentially. However by us relying on npm start, a startup script, we get something more flexible.

Use LABEL

Using the command LABEL is a great way to describe your Dockerfile better. You could use it to organize the files, help with automation and potential use cases, you know best what information makes sense to put there, but it exists to support you in bringing order to all your images so leverage it to your advantage. A labels value is a key-value pair like so LABEL [key]-[value]. Every label command can have multiple labels. In fact that it's considered to collect all your labels under one label command. You can do so by separating each key-value pair with a space character or like so:

LABEL key=value \
      key2=value2

Rely on default ports with EXPOSE

EXPOSE is what you use to open up ports on the container. To ensure we can talk to the container on that port we can use the -p command in conjunction with Docker run docker run -p [external]: [exposed docker port]. It's considered best practice to set the exposed port to the default ports used by what you are using like port 80 for an apache server and 27017 if you have a Mongo DB database etc.

Be explicit, use COPY over ADD

At first glance it looks like COPY and ADD does the same thing but there is a difference. ADD is able to extract TAR files as well, which COPY can't do. So be explicit and use COPY when you mean to copy files and ensure to only use ADD when you mean to use something feature specific like the mentioned TAR extraction.

Summary

There are many more best practices to follow when it comes to Dockerfile but the biggest gain I've mentioned throughout this post is the one on using the smallest image possible like alpine. It can make wonders for your image size, especially if the storage size is something you pay for.

Have a read in Dockerfile best practices docs for more great tips

docker devops

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Docker manifest - A peek into image's manifest.json files

The docker manifest command does not work independently to perform any action. In order to work with the docker manifest or manifest list, we use sub-commands along with it. This manifest sub-command can enable us to interact with the image manifests. Furthermore, it also gives information about the OS and the architecture, that a particular image was built for. The image manifest provides a configuration and a set of layers for a container image. This is an experimenta

Docker Explained: Docker Architecture | Docker Registries

Following the second video about Docker basics, in this video, I explain Docker architecture and explain the different building blocks of the docker engine; docker client, API, Docker Daemon. I also explain what a docker registry is and I finish the video with a demo explaining and illustrating how to use Docker hub.

What Is DevOps and Is Enterprise DevOps Any Good?

What is DevOps? How are organizations transitioning to DevOps? Is it possible for organizations to shift to enterprise DevOps? Read more to find out!

DevOps and Docker Live Show (Ep 91)

Join me with guest Docker Captain Elton Stoneman to talk about the state of Docker Desktop and Docker Hub. Support this show on Patreon! It's the #1 way to support me interviewing DevOps and container experts, and doing this Live Q&A.

DevOps Basics: What You Should Know

What is DevOps? What are the goals it helps achieves? What are its benefits? This article has answers!