When it comes to data science models they are intended to run periodically. As an example if we are predicting customer churn for next month, the model has to be run on the last day of each month. Manually running this model monthly is not an option. We can use a scheduler to automate this process. Apache Airflow is an ideal tool for this as it allows to schedule and monitor your workflows. In this article we will be talking about how to deploy Apache Airflow using Docker by keep room to scale up further. Being familiar with Apache Airflow and Docker concepts will be an advantage to follow this article.
Airflow consists of 3 major components; Web Server, Scheduler and a Meta Database. Web server is responsible for the user interface where the users can interact with the application. Scheduler is taking care of the job scheduling while Meta Database is storing the scheduling details. Even though Airflow has several executors, Celery executor is more suitable for scalability. With Celery executor 3 additional components are added to Airflow. They are Worker, Message Broker and Worker Monitor. Worker is responsible for executing jobs that are triggered by the scheduler. There can be multiple workers. These workers can be distributed across cluster instances. Number of workers can be decided upon the workload that has to be performed by the system along with machine capabilities. Message broker helps Celery to operate. A monitoring tool can be used to monitor Celery workers.
Apache Airflow with Celery Executor (Image by author)
With Docker, we plan each of above component to be running inside an individual Docker container. Web Server, Scheduler and workers will use a common Docker image. This common image is unique to the project and the Dockerfile to build that image will be discussed. All the other containers will use publicly available images directly.
For this tutorial PostgreSQL is used as the Meta Database, Redis is used for the message broker and Celery Flower is used to monitor workers. Since there are multiple containers it will be easy to use Docker Compose in order to deploy all the containers at once.
#docker-compose #pythonoperator #docker #celery-executor #apache-airflow