This is a guide that will ensure your Docker Image will be as small as possible but also ensure it's performant and you understand why you should use certain commands.
We know that the Dockerfile is like a recipe file where we can specify things like the OS image to base it on, what libraries should be installed, environment variables, commands we want to run and much more. Everything is there, specified in the file, it's super clear what you are getting. It's a really great advancement from the days where things just worked on our machine or when we spent hours or days installing things - It's progress.
We've created a Dockerfile to give you an idea of what it can look like. Let's discuss the various parts of the file to better understand it. Here goes:
// Dockerfile FROM node:latest WORKDIR /app COPY . . RUN npm install EXPOSE 3000 ENTRYPOINT ["node", "app.js"]
This is a pretty typical looking file. We select an OS image, set a working directory, copy the files we need, install some libraries, opens up a port and finally runs the applications. So what's wrong with that?
At first glance, everything looks the way we expect but at a close look, we can see that we are using
node:latest as an image. Let's try to build this into a Docker image with the command:
docker build -t optimize/node .
Ok, let's now run
docker images to see our image and get some more stats on it:
It weighs in at 899 MB
Ok, we have nothing to compare with but let's change the image to one called
node:alpine and rebuild our image:
This image is based on the Alpine Linux Project in general the Alpine Linux images are much smaller than normal distributions. It comes with some limitations, have a read here. In general it's a safe choice though.
For every command you specify in the Dockerfile it creates another image layer. What Docker does, however, is to first check the cache to see whether an existing layer can be reused before trying to create one.
When we come to instructions like ADD and COPY we should know how they operate in the context of the cache. For both of these commands, Docker calculates a checksum for each file and stores that in the cache. Upon a new build of the Docker images, each checksum is compared and if it differs, due to a change in the file, it recalculates the checksum and carries out the command. At this point, it creates a new image layer.
The way Docker operates is to try to reuse as much as possible. The best thing we can do is to place the instructions, in the Dockerfile, from the least likely to change to the most likely to change.
What does that mean?
Let's look at the top of our Dockerfile:
FROM node:alpine WORKDIR /app
Here we can see that the FROM command happens first followed by WORKDIR. Both these commands are not likely to change os they are correctly placed at the top.
What is likely to change though?
Well, you are building an application so the source files of your app, or libraries you realize you might suddenly need, like a
npm install, makes sense to place as further down in the file.
What do we gain by doing this?
Speed, we gain speed when we build our Docker image and we've placed the commands as efficiently as possible. So in summary ADD, COPY, RUn are commands that should happen later in the Dockerfile.
Every command you enter creates a new image layer. Ensure you keep the number of commands to a minimum. Group them if you can. Instead of writing:
RUN command RUN command2
Organize them like so:
RUN command && \ command2
When you build an app. It easily consist of a ton of files but when it comes to what you actually need to create your Docker image it ends up being a smaller number of files. If you create a
.dockerignore file you can define patterns that ensure that when we include files, we only get the ones we need, for our container.
Wether you use the command CMD or ENTRYPOINT, you should NOT call the application directly like so
node app.js. Instead, try to define a starter script like this
Why you ask?
We want to make sure we are flexible and unlikely to change this instruction. We might actually end up changing how we start our app by us gradually adding flags to it like so
node app.js --env=dev --seed=true. You get the idea, it's a moving target potentially. However by us relying on
npm start, a startup script, we get something more flexible.
Using the command LABEL is a great way to describe your Dockerfile better. You could use it to organize the files, help with automation and potential use cases, you know best what information makes sense to put there, but it exists to support you in bringing order to all your images so leverage it to your advantage. A labels value is a key-value pair like so
LABEL [key]-[value]. Every label command can have multiple labels. In fact that it's considered to collect all your labels under one label command. You can do so by separating each key-value pair with a space character or like so:
LABEL key=value \ key2=value2
EXPOSE is what you use to open up ports on the container. To ensure we can talk to the container on that port we can use the
-p command in conjunction with Docker run
docker run -p [external]: [exposed docker port]. It's considered best practice to set the exposed port to the default ports used by what you are using like port 80 for an apache server and 27017 if you have a Mongo DB database etc.
At first glance it looks like COPY and ADD does the same thing but there is a difference. ADD is able to extract TAR files as well, which COPY can't do. So be explicit and use COPY when you mean to copy files and ensure to only use ADD when you mean to use something feature specific like the mentioned TAR extraction.
There are many more best practices to follow when it comes to Dockerfile but the biggest gain I've mentioned throughout this post is the one on using the smallest image possible like alpine. It can make wonders for your image size, especially if the storage size is something you pay for.
Have a read in Dockerfile best practices docs for more great tips
The docker manifest command does not work independently to perform any action. In order to work with the docker manifest or manifest list, we use sub-commands along with it. This manifest sub-command can enable us to interact with the image manifests. Furthermore, it also gives information about the OS and the architecture, that a particular image was built for. The image manifest provides a configuration and a set of layers for a container image. This is an experimenta
Following the second video about Docker basics, in this video, I explain Docker architecture and explain the different building blocks of the docker engine; docker client, API, Docker Daemon. I also explain what a docker registry is and I finish the video with a demo explaining and illustrating how to use Docker hub.
What is DevOps? How are organizations transitioning to DevOps? Is it possible for organizations to shift to enterprise DevOps? Read more to find out!
Join me with guest Docker Captain Elton Stoneman to talk about the state of Docker Desktop and Docker Hub. Support this show on Patreon! It's the #1 way to support me interviewing DevOps and container experts, and doing this Live Q&A.
What is DevOps? What are the goals it helps achieves? What are its benefits? This article has answers!