Reading Avro files using Apache Flink

In this blog, we will see how to read the Avro files using Flink.

Before reading the files, let’s get an overview of Flink.

There are two types of processing –** batch and real-time.**

**Batch Processing: **Processing based on the data collected over time.
**Real-time Processing: **Processing based on immediate data for an instant result.

Real-time processing is in demand and Apache Flink is the real-time processing tool.

Some of the flink features include:

Fast speed
Support for scala and java
Low-latency
Fault-tolerance
Scalability

Let’s get started.

Step 1:

Add the required dependencies in build.sbt:

name := "flink-demo"

version := "0.1"

scalaVersion := "2.12.8"

libraryDependencies ++= Seq(

"org.apache.flink" %% "flink-scala" % "1.10.0",

"org.apache.flink" % "flink-avro" % "1.10.0",

"org.apache.flink" %% "flink-streaming-scala" % "1.10.0"

)

Step 2:

The next step is to create a pointer to the environment on which this program runs. In spark, it is similar to spark context.

val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

Step 3:

Setting parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instance.

I am using 1 as it is a demo application.

env.setParallelism(1)

#apache flink #flink #scala #streaming ##apache-flink ##avro files #apache #avro

blog.knoldus.com

Reading Avro files using Apache Flink