Reverse Engineer Docker Images into Dockerfiles

Introduction

As public Docker registries like Docker Hub and TreeScale increase in popularity, except for the most restrictive environments, it has become common for admins and developers to casually download an image built by an unknown entity. It often comes down to the convivence outweighing the perceived risk. When a Docker image is made publicly available, the Dockerfile is sometimes also provided, either directly in the listing, in a git repository, or through an associated link, but sometimes this is not the case. Even if the Dockerfile was made available, we don’t have many assurances that the published image is safe to use.

Maybe security vulnerabilities aren’t your concern. Perhaps one of your favorite images is no longer being maintained, and you would like to update it so that it runs on the latest version of Ubuntu. Or perhaps a compiler for another distribution has an exclusive feature that makes it better optimized to produce binaries during compile time, and you have an uncontrollable compulsion to release a similar image that’s just a little more optimized.

Whatever the reason, if you wish to recover a Dockerfile from an image, there are options. Docker images aren’t a black box. Often, you can retrieve most of the information you need to reconstruct a Dockerfile. In this article, we will explore exactly how to do that by looking inside a Docker image so that we can very closely reconstruct the Dockerfile that built it.

In this article, we will show how it’s possible to reconstruct a Dockerfile from an image using two tools, [Dedockify](https://github.com/mavenshark/Dedockify), a customized Python script provided for this article, and [dive](https://github.com/wagoodman/dive). The basic process flow used will be as follows.

Using `dive`

To get some quick, minimal-effort intuition regarding how images are composed, we will introduce ourselves to various advanced and potentially unfamiliar Docker concepts using Dive. Dive is an image exploration tool that allows examination of each layer of a Docker image.

First, let us create a simple, easy to follow Dockerfile that we can explore for testing purposes.

In an empty directory, enter the following snippet directly into the command line:

Shell

cat > Dockerfile << EOF ; touch testfile1 testfile2 testfile3

FROM scratch

COPY testfile1 /

COPY testfile2 /

COPY testfile3 /

EOF

By entering the above and pressing enter, we’ve just created a new Dockerfile and populated three zero-byte test files in the same directory.

Shell

$ ls Dockerfile  testfile1  testfile2  testfile3

So now, let’s build an image using this Dockerfile and tag it as example1.

Shell

$ ls Dockerfile  testfile1  testfile2  testfile3

Building the example1 image should produce the following output:

Shell

Sending build context to Docker daemon  3.584kB

Step 1/4 : FROM scratch

 --->

Step 2/4 : COPY testfile1 /

 ---> a9cc49948e40

Step 3/4 : COPY testfile2 /

 ---> 84acff3a5554

Step 4/4 : COPY testfile3 /

 ---> 374e0127c1bc

Successfully built 374e0127c1bc

Successfully tagged example1:latest

The following zero-byte example1 image should now be available:

Shell

$ docker images

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

example1            latest              374e0127c1bc        31 seconds ago      0B

Note that since there’s no binary data, this image won’t be functional. We are only using it as a simplified example of how layers can be viewed in Docker images.

We can see here by the size of the image that there is no source image. Instead of a source image, we used scratch which instructed Docker to use a zero-byte blank image as the source image. We then modified the blank image by copying three additional zero-byte test files onto it, and then tagged the changes as example1.

Now, let us explore our new image with Dive.

Shell

docker run --rm -it \

    -v /var/run/docker.sock:/var/run/docker.sock \

    wagoodman/dive:latest example1

Executing the above command should automatically pull wagoodman/dive from Docker Hub, and produce the output of Dive’s polished interface.

Shell

Unable to find image 'wagoodman/dive:latest' locally

latest: Pulling from wagoodman/dive

89d9c30c1d48: Pull complete

5ac8ae86f99b: Pull complete

f10575f61141: Pull complete

Digest: sha256:2d3be9e9362ecdcb04bf3afdd402a785b877e3bcca3d2fc6e10a83d99ce0955f

Status: Downloaded newer image for wagoodman/dive:latest

Image Source: docker://example-image

Fetching image... (this can take a while for large images)

Analyzing image...

Building cache...

Scroll through the three layers of the image in the list to find the three files in the tree displayed on the right.

We can see the contents on the right change as we scroll through each layer. As each file was copied to a blank Docker scratch image, it was recorded as a new layer.

Notice also that we can see the commands that were used to produced each layer. We can also see the hash value of the source file and the file that was updated.

If we take note of the items in the Command: section, we should see the following:

Shell

#(nop) COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in /

#(nop) COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in /

#(nop) COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in /

Each command provides solid insight into the original command used in the Dockerfile to produce the image. However, the original filename is lost. It appears that the only way to recover this information is to make observations about the changes to the target filesystem, or perhaps to infer based on other details. More on this later.

#docker #kubernetes #cloud native #dockerfile #docker image #dive

Introduction

Using dive

dzone.com

Reverse Engineer Docker Images into Dockerfiles

Using `dive`