The inspiration for writing this article came after reading the following on Airflow extensibility using CWL-Airflow:

  • CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language, GigaScience, Volume 8, Issue 7, July 2019, giz084, Michael Kotliar, Andrey V Kartashov, Artem Barski

The latest documentation, published by the above developer/s, can be found at,  https://cwl-airflow.readthedocs.io/en/latest.

With the documentation link above and concepts learned from  https://github.com/puckel/docker-airflow, what follows is an outline of setting up a cwl-airflow Docker Compose stack.

A git repo containing the stack components be found at this link.

This stack is not intended for use on a public network. Windows 10 WLS2 running Ubuntu 20.0 LTS from the MS Store was used as the host environment, with WSL2 integration enabled via Docker for Desktop.

Preliminary Notes

The core components and their equivalent version numbers include Airflow version 1.10.11 and CWL-Airflow 1.2.2. Airflow was configured to run in CeleryExecutor mode with a single Worker and MySQL metadata backend.

In order to successfully execute CWL workflows with a DockerRequirement specification was somewhat of a trial and error task, and as such, some workarounds were required to get things to work as they should (mostly). This was mainly related to host:container volume mounts of the CWL-Airflow specific folders (cwl_tmp_folder, cwl_outputs_folder, cwl_inputs_folder, cwl_pickle_folder). The absolute paths for these needed to specified within the docker-compose.yml and must match the absolute paths within the Airflow containers. By default, CWL-airflow creates these folders at $AIRFLOW_HOME/.

To address this issue, user airflow was created on the host and as part of the docker image with a $HOME directory on the host identical to $AIRFLOW_HOME within the airflow containers.

#cwl-airflow #common-workflow-language #docker #rabix #cwltool

Setup and Run CWL-Airflow Workflows with Docker Compose
6.05 GEEK