The inspiration for writing this article came after reading the following on Airflow extensibility using CWL-Airflow
:
The latest documentation, published by the above developer/s, can be found at, https://cwl-airflow.readthedocs.io/en/latest.
With the documentation link above and concepts learned from https://github.com/puckel/docker-airflow, what follows is an outline of setting up a cwl-airflow
Docker Compose stack.
A git repo containing the stack components be found at this link.
This stack is not intended for use on a public network. Windows 10 WLS2 running Ubuntu 20.0 LTS from the MS Store was used as the host environment, with WSL2 integration enabled via Docker for Desktop.
The core components and their equivalent version numbers include Airflow version 1.10.11 and CWL-Airflow 1.2.2. Airflow was configured to run in CeleryExecutor
mode with a single Worker and MySQL metadata backend.
In order to successfully execute CWL workflows with a DockerRequirement
specification was somewhat of a trial and error task, and as such, some workarounds were required to get things to work as they should (mostly). This was mainly related to host:container volume mounts of the CWL-Airflow specific folders (cwl_tmp_folder, cwl_outputs_folder, cwl_inputs_folder, cwl_pickle_folder
). The absolute paths for these needed to specified within the docker-compose.yml
and must match the absolute paths within the Airflow containers. By default, CWL-airflow creates these folders at $AIRFLOW_HOME/
.
To address this issue, user airflow
was created on the host and as part of the docker image with a $HOME directory on the host identical to $AIRFLOW_HOME within the airflow containers.
#cwl-airflow #common-workflow-language #docker #rabix #cwltool