I’ve been using it for around 2 years now to build out custom workflow interfaces, like those used for Laboratory Information Management Systems (LIMs), Computer Vision pre and postprocessing pipelines, and to set and forget other genomics pipelines.

My favorite feature of Airflow is how completely agnostic it is to the work you are doing or where that work is taking place. It could take place locally, on a Docker image, on Kubernetes, on any number of AWS services, on an HPC system, etc. Using Airflow allows me to concentrate on the business logic of what I’m trying to accomplish without getting too bogged down in implementation details.

During that time I’ve adopted a set of systems that I use to quickly build out the main development stack with Docker and Docker Compose, using the Bitnami Apache Airflow stack. Generally, I either deploy the stack to production using either the same Docker compose stack if its a small enough instance that is isolated, or with Kubernetes when I need to interact with other services or file systems.

Bitnami vs Roll Your Own

I used to roll my own Airflow containers using Conda. I still use this approach for most of my other containers, including microservices that interact with my Airflow system, but configuring Airflow is a lot more than just installing packages. Also, even just installing those packages is a pain and I could rarely count on a rebuild actually working without some pain. Then, on top of the packages you need to configure database connections and a message queue.

In comes the Bitnami Apache Airflow docker compose stack for dev and Bitnami Apache Airflow Helm Chart for prod!

Bitnami, in their own words:

_Bitnami makes it easy to get your favorite open source software up and running on any platform, including your laptop, Kubernetes and all the major clouds. In addition to popular community offerings, Bitnami, now part of VMware, provides IT organizations with an enterprise offering that is secure, compliant, continuously maintained and customizable to your organizational policies. _https://bitnami.com/

Bitnami stacks (usually) work completely the same from their Docker Compose stacks to their Helm charts. This means I can test and develop locally using my compose stack, build out new images, versions, packages, etc, and then deploy to Kubernetes. The configuration, environmental variables, and everything else acts the same. It would be a fairly large undertaking to do all this from scratch, so I use Bitnami.

They have plenty of enterprise offerings, but everything included here is open source and there is no paywall involved.

And no, I am not affiliated with Bitnami, although I have kids that eat a lot and don’t have any particular ethical aversions to selling out. ;-) I’ve just found their offerings to be excellent.

Project Structure

I like to have my projects organized so that I can run tree and have a general idea of what’s happening.

Apache Airflow has 3 main components, the application, the worker, and the scheduler. Each of these has it’s own Docker image to separate out the services. Additionally, there is a database and an message queue, but we won’t be doing any customization to these.

.
└── docker
    └── bitnami-apache-airflow-1.10.10
        ├── airflow
        │   └── Dockerfile
        ├── airflow-scheduler
        │   └── Dockerfile
        ├── airflow-worker
        │   └── Dockerfile
        ├── dags
        │   └── tutorial.py
        ├── docker-compose.yml

So what we have here is a directory called bitnami-apache-airflow-1.10.10. Which brings us to a very important point! Pin your versions! It will save you so, so much pain and frustration!

Then we have one Dockerfile per Airflow piece.

Create this directory structure with:

mkdir -p docker/bitnami-apache-airflow-1.10.10/{airflow,airflow-scheduler,airflow-worker,dags}

#data-science #docker #apache-airflow #docker-compose #python

Get a Fully Configured Apache Airflow Docker Dev Stack with Bitnami
2.00 GEEK