Airflow is a powerful and flexible workflow automation and scheduling system, powered by Python with a rich set of integrations and tools available out-of-the-box.
Although the vast amount of documentation and large configuration files make learning Airflow look like a daunting task, it is easy to get either a simple or more complex configuration up and running quickly for developers to begin writing code and learn how the product actually works. In this article I will demonstrate how to use Airflow with Docker to achieve this using a public Github repo I authored here:
https://github.com/suburbanmtman/airflow-intro
DAG — directed acyclic graph. How to run a workflow, visible in the dashboard on the web interface.
Worker — one or more systems responsible for running the code in the workflow task queues
Webserver — displays UI for managing Airflow, manages user requests for running tasks, and receives updates from DAG runs via workers
Scheduler — determines if a Task needs to be run and triggers work to be processed by a Worker
Operator — a step of what actually gets run inside a DAG
Task — an instantiated Operator created by the scheduler, a single unit of work
**Task Instance** — stored state of a task
#python #apache-airflow #docker-compose #docker #introduction