In this blog, we are going to run the sample dynamic DAG using docker.
Before that, let’s get a quick idea about the airflow and some of its terms.
What is Airflow?
Airflow is a workflow engine which is responsible for managing and scheduling running jobs and data pipelines. It ensures that the jobs are ordered correctly based on dependencies and also manages the allocation of resources and failures.**
Before going forward, let’s get familiar with the terms:
Task or Operator: A defined unit of work.
Task instance: An individual run of a single task. The states could be running, success, failed, skipped, and up for retry.
DAG (Directed Acyclic Graph): A set of tasks with an execution order.
**DAG Run: **Individual DAG run.
Web Server: It is the UI of airflow, it also allows us to manage users, roles, and different configurations for the Airflow setup.
Scheduler: Schedules the jobs or orchestrates the tasks. It uses the DAGs object to decide what tasks need to be run, when, and where.
Executor: Executes the tasks. There are different types of executors:
Metadata Database: Stores the Airflow states. Airflow uses SqlAlchemy and Object Relational Mapping (ORM) written in Python to connect to the metadata database.
Now that we are familiar with the terms, let’s get started.
#apache airflow #docker #python #scala