A step-by-step tutorial for running Streaming ETL with Flink on Zeppelin. Let’s dive deeper into the Flink interpreter in Zeppelin Notebooks.

Apache Zeppelin 0.9 comes with a redesigned interpreter for Apache Flink that allows developers and data engineers to use Flink directly on Zeppelin notebooks for interactive data analysis. Over the next paragraphs, we describe why Streaming ETL is a great fit for stream processing frameworks like Apache Flink and we dive deeper into the Flink interpreter in Zeppelin Notebooks by showcasing a tutorial of how developers can run Streaming ETL data pipelines with Flink on Zeppelin.

zeppelin image

Streaming ETL and Apache Flink

Extract-transform-load (ETL) is a common operation related to massaging and moving data between storage systems. ETL jobs have historically been triggered periodically, frequently copying data from transactional database systems to an analytical database or a data warehouse.

Streaming ETL pipelines serve a similar purpose traditional ETL: they transform and enrich data and can move it from one storage system to another. However, streaming ETL pipelines are different from traditional ETL in that they operate continuously and are capable of both reading records from sources that continuously produce data as well as moving the data, with low latency, to their desired destination.

Streaming ETL is a common use case for Apache Flink because of its ability to address most common data transformation or enrichment tasks with Flink SQL (or Table API) and its support for user-defined functions. Additionally, Flink provides a rich set of connectors to various storage systems such as KafkaKinesisElasticsearch, and JDBC database systems. It also features continuous sources for file systems that monitor directories and sinks and write files in a time-bucketed fashion. Let us now describe how the Flink interpreter works in Zeppelin notebooks.

#open source #tutorial #apache flink #apache zeppelin #apache

Running Streaming ETL Pipelines with Apache Flink on Zeppelin Notebooks
2.30 GEEK