Apache Airflow is a popular platform for programmatically authoring, scheduling, and monitoring workflows. Airflow has been deployed by companies like Adobe, Airbnb, Etsy, Instacart, and Square. The advantage of defining workflows as code is that they become more maintainable, versionable, testable, and collaborative. Airflow is used to author these workflows as directed acyclic graphs (DAGs) of tasks. Airflow’s scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. Its browser-based UI makes it easy to visualize pipelines running in production, monitor their progress, and troubleshoot issues when needed.

YugabyteDB and Apache Airflow and Google Cloud Platform, GCP

Why Airflow with a YugabyteDB backend?

By default, Airflow makes use of a SQLite database for its metadata store, which both the scheduler and web UI rely on. Typically, when Airflow is used in production, the SQLite backend is replaced with a traditional RDBMS like PostgreSQL. However, in order for PostgreSQL not to become a single point of failure in the Airflow deployment, administrators will still need to devise high-availability and failover strategies for PostgreSQL. There’s a simpler solution that Airflow can interact with just like PostgreSQL, but has the added advantages of high availability, support for multiple cloud and topology deployment options, plus high performance built in; it is YugabyteDB.

#databases #distributed sql #google cloud platform #how to #kubernetes #open source #postgresql

Part 1: Deploying a Distributed SQL Backend for Apache Airflow
1.25 GEEK