Dockerizing a Simple Python Process

Dockerizing a Simple Python Process

Dockerizing a Simple Python Process - This is part two in a series on taking a simple Python project from local script to production. In part one I talked about a gotcha I ran into when converting an old project from Python 2 to Python 3...

Dockerizing a Simple Python Process - This is part two in a series on taking a simple Python project from local script to production. In part one I talked about a gotcha I ran into when converting an old project from Python 2 to Python 3...

This part will go over how I put my Python process, its inputs, and its outputs into a Docker container and made an image publicly available on Dockerhub.

Requirements that I will not go over here. Go to Docker.com and follow the instructions there

  • Download docker
  • Create a docker id
  • Log in with your docker id on Dockerhub
What is Docker?

Docker is a containerization platform. Containerization is a way to package units of code with their dependencies so that they have everything they need to run in isolation.

Using Docker can help fix the "it works on my machine" problem, and writing dockerized code is a great way to encourage thoughtful code practices. Docker containers should be simple, responsible for as little as possible, and dependent on as few externals as possible.

Docker image vs docker container

Throughout this post, and online, you'll see the terms container and image. An image is basically a snapshot of your dockerized code that is created when you use the docker build command - more on that below. Docker images start a container when you use docker run on that image. So a container is a running instance of an image.

Anatomy of a Dockerfile

I decided to dockerize my csv writer from the previous post in this series so that I could move it between environments easily.

For this I needed a Dockerfile. A Dockerfile is a text file that does not have a file extension.

Here's what the dockerfile for my Python code looks like:

FROM python:3.7
ARG export_file=goodreads.csv
COPY $export_file goodreads_export.csv
COPY converter.py /
CMD ["python", "./converter.py"]

FROM

The FROM keyword here indicates a dependency. Docker containers don't have languages automatically loaded. To access Python to run the code, we need to instruct the image to include python:3.7.

A note on Docker registries:> the default Docker registry is Dockerhub. If a docker image is available on Dockerhub, you don't need to specify a url when pulling or pushing from a docker repo. You just need the author's username and the repo name. For example, you can pull the docker image from this post with the command docker pull thejessleigh/goodreads-libib-converter. If you're using a different registry you'll need to tell Docker where to go. For example, if you're using Quay you'd do docker pull quay.io/example-username/test-docker-repo.> The python dependency in my Dockerfile doesn't have a username because it's an official repo hosted on Dockerhub.## ARG

ARG declares an argument. It is the only instruction in a Dockerfile that can precede FROM, although I prefer to have FROM come first for the sake of consistency.

In the above example, I declare an ARG export_file and give it a default. It expects a file called goodreads.csv in the same directory as the Dockerfile. If I want to pass in something different, I instruct it to use a different filename with --build-arg=export_file=my_goodreads_export.csv when building the image.

COPY

COPY and ADD duplicate the contents of a file into the docker image. This is where I'm importing the input file and also the actual Python code that the Docker image executes.

COPY takes two arguments:

  • the location of the file you're putting into the image
  • the location of the file inside the docker image

So whatever file I include as the CSV to convert will be referred to as goodreadsexport.csv inside the Docker container. This is nifty, because it means that no matter what I build the docker image with, the filename will always be consistent. I don't have to worry about making the Python code handle different filenames or paths. It can always look for ./goodreadsexport.csv.

RUN

RUN issues an instruction that is executed and committed as part of the image. If I were dockerizing a Python project that needed to install external packages, I could use RUN to pip install those dependencies. However, converter.py is a very simple process that doesn't need external packages, so I don't need to run anything as part of my build process.

CMD

There can only be one CMD instruction per Dockerfile. If the Dockerfile contains multiple CMDs, only the last one will execute.

CMD is the command you intend the image to do when you run an instance of it as a container. It is not executed as part of the build process for an image. CMD is different from RUN in this way.

Building a docker image

Now we have everything necessary to build a Docker image for our Python code from the Dockerfile.

As stated above, a Docker image is an inert snapshot of an environment that is ready to execute a command or program, but has not yet executed that command.

To build using the above Dockerfile, we run

docker build --build-arg=export_file=goodreads_export.csv -t goodreads-libib-converter .

--build-arg tells Docker to build the image with a file called goodreads_export.csv, overriding the default expectation of goodreads.csv.

-t goodreads-libib-converter "tags" the image as goodreads-libib-converter. This is how you create your container with a human readable REPOSITORY name.

. tells Docker to look for a Dockerfile to build in the current directory.

After I do this, I can see that the image was successfully created by checking my image list.

> docker image list
REPOSITORY                 TAG       IMAGE ID       CREATED             SIZE
goodreads-libib-converter  latest    1234567890     12 seconds ago      924MB

Running a Docker container

Now that I have an image, I have a standalone environment capable of running my program, but it hasn't actually executed the core procedure specified with CMD yet. Here's how I do that:

docker run goodreads-libib-container

I see the print debugging statements I have in my converter.py file execute, so I know how many CSV rows are being converted. When I ran the program locally, it created an output file called libib_export.csv. However, when I check the contents of my directory now, it's not there. How is that useful!?

Accessing Files Written Out

I'm no longer running the Python code in the directory I was before. I'm running it inside the Docker container. Therefore, any files that are written out will also be stored inside the Docker container. The output file doesn't do me much good in there!

I'm running the Docker container locally, so all I have to do is find the container and copy the output file from it's dockerized location to the place I actually want it.

docker cp container_id:/libib_export.csv ~/outputs/libib_export.csv


This extracts the resultant CSV output from converter.py and puts it somewhere I can access it.

I can figure out the container_id (or the human readable name) with

> docker ps -a
CONTAINER ID  IMAGE                   COMMAND                  CREATED             NAMES
e00000000000  goodreads-libib-export  "python ./converter.…"   24 seconds ago      naughty_mcclintock

Yes, naughty_mcclintock is actually the procedurally generated name for the container I've been working with locally.
Copying a file from a container to my desired location is fine for a local environment, but has limited uses if I ever want to take this project to production. There are other, better options for dealing with output files from Docker containers, but we'll get into that ✨ in another installment in this series ✨

Committing a docker image

After we've run the container to confirm that it works, we probably to create a new image based on the changes it made when it executed. We're preparing the image that we want to push up into an external Docker registry, like Dockerhub.

When committing a Docker image, we need to specify the registry (if it's something other than dockerhub), the author name, the repository name, and the tag name.

docker commit -m "Working Python 3 image" naughty_mcclintock thejessleigh/goodreads-libib-converter:python3

My docker commit was successful, so I see a sha256 hash output in my terminal. Creating a commit message is, of course, optional. But I like to do it to keep organized.

A note on Docker image tags:> When you pull a Docker image and you don't specify a tag it will use the default tag (usually latest). Tags are the way you can keep track of changes in your project without overwriting previous versions. For example, if you (for some reason) are still using Python 2, you can access the Python 2 image by running docker pull thejessleigh/goodreads-libib-converter:python2. Right now the :python3 and latest tags on my rocker repo are the same, but you can pull either one.# Pushing a docker image to Dockerhub

Now that I have an image I want to put out into the world, I can push it up to Dockerhub.

First, I need to log into Dockerhub and create a repository. Repositories require a name, and should have a short description which details the purpose of the project, and a long description that explains dependencies, requirements, build arguments, etc. You can also make a Docker repository private.

Once I've done that, I run docker push, which sends the latest commit of the project and tag I've specified up to the external registry. If you didn't specify a tag, this push will override the latest tag in your repository.

docker push thejessleigh/goodreads-libib-converter:python3

If you go to my Dockerhub profile you can see the goodreads-libib-converter project, and pull both the Python 2 and Python 3 incarnations.

Next Steps

Now that I have a working Docker image, I want to put it into production so that anyone can convert their Goodreads library CSV into a Libib library CSV. I'm going to go about this using AWS, which requires a bit of setup.

The next installment in this series will go over setting up an AWS IAM account, setting up awscli and configuring your local profiles, and creating an s3 bucket that your IAM account can access.

Linking python app docker and postgress docker

I have two docker containers running by the following commands:

I have two docker containers running by the following commands:

  • docker run --name postgres -v "/Users/xxx/Desktop/Coding/DockerMounting":/home/ -e POSTGRES_PASSWORD=xyz -d postgres
  • docker run -it -v "/Users/xxx/Desktop/Coding/DockerMounting":/home/t -p 5000:5000 --name some-app --link postgres:postgres -d xxx/ubuntu:latest

I have created the necessary user, database and table in my postgres (psql) container.

I am trying to run a python script:

import os

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker

engine = create_engine(os.getenv("DATABASE_URL"))
db = scoped_session(sessionmaker(bind=engine))

def main():
flights = db.execute("SELECT origin, destination, duration FROM flights").fetchall()
for flight in flights:
print(f"{flight.origin} to {flight.destination}, {flight.duration} minutes.")

if name == "main":
main()

I get the following error:

  File "list.py", line 6, in <module>
engine = create_engine(os.getenv("DATABASE_URL"))
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/init.py", line 435, in create_engine
return strategy.create(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/strategies.py", line 56, in create
plugins = u._instantiate_plugins(kwargs)

I know one issue is that I need to set DATABASE_URL env - but I am not sure what should be that value

WordPress in Docker. Part 1: Dockerization

WordPress in Docker. Part 1: Dockerization

This entry-level guide will tell you why and how to Dockerize your WordPress projects.

This entry-level guide will tell you why and how to Dockerize your WordPress projects.

Docker-first Python development

Docker-first Python development

Docker-first Python development - In this article, we’ll be talking about how to start using Docker for python development.

Docker-first Python development - In this article, we’ll be talking about how to start using Docker for python development.

I've always been a bit annoyed at how difficult it can be to avoid shipping test code and dependencies with Python applications. A typical build process might look something like:

  1. create a virtual environment
  2. install service dependencies
  3. install test dependencies
  4. run tests
  5. package up code and dependencies into an RPM.

At this point, my service dependencies and test dependencies are intermingled in the virtual environment. To detangle them, I now have to do something like destroy the venv and create a new one, reinstalling the service dependencies.

Regardless of the packaging method, I don't want to pull down dependencies when I deploy my service.

At Twilio, we are in the process of embracing container-based deployments. Docker containers are great for Python services as you no longer have to worry about multiple python versions or virtual environments. You just use an image with exactly the version of Python your service needs and install your dependencies directly into the system.

One thing I've noticed is that while many services are built and packaged as Docker images, few use exclusively Docker-based development environments. Virtual environments and pyenv .python-version files abound!

I recently started writing a new Python service with the knowledge that this would be exclusively deployed via containers. This felt like the right opportunity to go all in on containers and build out a strategy for Docker-first localdev. I set out with the following goals:

  1. don't ship tests and test dependencies with the final image
  2. tests run as part of the Docker build
  3. failing tests will fail the build
  4. IDE (PyCharm) integration

A bit of research (aka Googling) suggested that multi-stage builds might be useful in this endeavor. Eventually I ended up with a Dockerfile that looks something like this:

FROM python3 as builder

COPY requirements.txt ./

RUN pip install -r requirements.txt

COPY src ./src



FROM builder as tests

COPY test_requirements.txt ./

RUN pip install -r test_requirements.txt

COPY tests ./tests

RUN pytest tests



FROM builder as service

COPY docker-entrypoint.sh ./

ENTRYPOINT ["docker-entrypoint.sh"]

EXPOSE 3000

When building an image from this Dockerfile, Docker will build 3 images, one for each of the .python-version statements in the docker file. If you've worked with Dockerfiles before, you know that statement ordering is critical for making efficient use of layer cacheing, and multi-stage builds are no different. Docker builds each of the images in the order they are defined. All of the intermediate stages are ephemeral, only the last image is output by the build process.

In this case, the first stage (.python-version) builds an image with all the service dependencies and code. The second stage (.python-version) installs the test requirements and test code, and runs the tests. If the tests pass, the build process will continue on to the next stage. If the tests fail, the entire build will fail. This ensures that only images with passing tests are built! Finally, the last stage (.python-version) builds on top of our .python-version image, adding the entrypoint script, defining the entrypoint command and exposing port 3000.

So how did I do wrt the initial goals?

  1. don't ship tests and test dependencies with the final image ✓
  2. tests run as part of the Docker build ✓
  3. failing tests will fail the build ✓
  4. IDE (PyCharm) integration ❌

I've met most of the goals, but what about the actual development experience? If I open up PyCharm and import my source code, it complains that I have unsatisfied dependencies :( Fortunately PyCharm Professional has the ability to select a python interpreter from inside a Docker image! Cool, but I have to build the image before I can use its interpreter. But thanks to goal #3, if my tests are failing, I can't build my image...

Lucky for us, we can tell .python-version to build one of our intermediate stages explicitly, stopping the build after the desired stage. Now if I run .python-version, I can select the interpreter from the .python-version image.

Uh oh! The builder image doesn't include my test dependencies! Of course, that's the whole point of the builder image. Let's add another stage we can use for running and debugging our tests.

FROM python3 as builder

COPY requirements.txt ./

RUN pip install -r requirements.txt

COPY src ./src



FROM builder as tests

COPY test_requirements.txt ./

RUN pip install -r test_requirements.txt

COPY tests ./tests

RUN pytest tests



FROM builder as service

COPY docker-entrypoint.sh ./

ENTRYPOINT ["docker-entrypoint.sh"]

EXPOSE 3000

With the .python-version stage, I can build and image with all my service and test code and dependencies. I can even make the localdev container run the tests by default when the container is run. By using the interpreter from this image, I can now debug my failing tests.

Let's take a look again at the initial goals:

  1. don't ship tests and test dependencies with the final image ✓
  2. tests run as part of the Docker build ✓
  3. failing tests will fail the build ✓
  4. IDE (PyCharm) integration ✓

Hooray!

Except there's one thing still bothering me: changes to the service code trigger a reinstallation of our test dependencies. Yuck! Let's take another whack at our Dockerfile:

FROM python3 as builder

COPY requirements.txt ./

RUN pip install -r requirements.txt

COPY src ./src



FROM builder as tests

COPY test_requirements.txt ./

RUN pip install -r test_requirements.txt

COPY tests ./tests

RUN pytest tests



FROM builder as service

COPY docker-entrypoint.sh ./

ENTRYPOINT ["docker-entrypoint.sh"]

EXPOSE 3000

Ok that seems pretty complicated, here's a graph of our image topology:

FROM python3 as builder

COPY requirements.txt ./

RUN pip install -r requirements.txt

COPY src ./src



FROM builder as tests

COPY test_requirements.txt ./

RUN pip install -r test_requirements.txt

COPY tests ./tests

RUN pytest tests



FROM builder as service

COPY docker-entrypoint.sh ./

ENTRYPOINT ["docker-entrypoint.sh"]

EXPOSE 3000

I don't love that the .python-version and .python-version stages both copy over the source directory, but the real question is, does this still meet our initial goals while avoiding excessive re-installs of test dependencies? Yeah, it seems to work pretty well. Thanks to Docker's layer caching, we rarely have to re-install dependencies.

Originally published by*** **** *Jeremy Moore at dev.to

=================================================================

Thanks for reading :heart: If you liked this post, share it with all of your programming buddies! Follow me on Facebook | Twitter

Learn More

Complete Python Bootcamp: Go from zero to hero in Python 3

Python for Time Series Data Analysis

Python Programming For Beginners From Scratch

Python Network Programming | Network Apps & Hacking Tools

Intro To SQLite Databases for Python Programming

Beginner’s guide on Python: Learn python from scratch! (New)

Python for Beginners: Complete Python Programming

The Data Science Course 2019: Complete Data Science Bootcamp