In this blog, we will see how to read the Avro files using Flink.

Before reading the files, let’s get an overview of Flink.

There are two types of processing –** batch and real-time.**

  • **Batch Processing: **Processing based on the data collected over time.
  • **Real-time Processing: **Processing based on immediate data for an instant result.

Real-time processing is in demand and Apache Flink is the real-time processing tool.

Some of the flink features include:

  • Fast speed
  • Support for scala and java
  • Low-latency
  • Fault-tolerance
  • Scalability

Let’s get started.

Step 1:

Add the required dependencies in build.sbt:

name := "flink-demo"

version := "0.1"

scalaVersion := "2.12.8"

libraryDependencies ++= Seq(

"org.apache.flink" %% "flink-scala" % "1.10.0",

"org.apache.flink" % "flink-avro" % "1.10.0",

"org.apache.flink" %% "flink-streaming-scala" % "1.10.0"

)

Step 2:

The next step is to create a pointer to the environment on which this program runs. In spark, it is similar to spark context.

val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

Step 3:

Setting parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instance.

I am using 1 as it is a demo application.

env.setParallelism(1)

#apache flink #flink #scala #streaming ##apache-flink ##avro files #apache #avro

Reading Avro files using Apache Flink
7.25 GEEK