Grace  Lesch

Grace Lesch


Kafka to Flink to Cassandra

Minimum Requirements and Installations

To start the application, you will need Kafka, and Cassandra installed locally on your machine. The minimum requirements for the application:

Java 1.8+, scala 2.12.2, Flink 1.9.0 , sbt 1.3.12, Kafka 2.3.0 , Cassandra 3.10

#cassandra #database

What is GEEK

Buddha Community

 Kafka to Flink to Cassandra
Myrl  Prosacco

Myrl Prosacco


Using Apache Flink for Kinesis to Kafka Connect

In this blog, we are going to use kinesis as a source and kafka as a consumer.

Let’s get started.

Step 1:

Apache Flink provides the kinesis and kafka connector dependencies. Let’s add them in our build.sbt:

name := "flink-demo"

version := "0.1"

scalaVersion := "2.12.8"

libraryDependencies ++= Seq(
  "org.apache.flink" %% "flink-scala" % "1.10.0",
  "org.apache.flink" %% "flink-connector-kinesis" % "1.10.0",
  "org.apache.flink" %% "flink-connector-kafka" % "1.10.0",
  "org.apache.flink" %% "flink-streaming-scala" % "1.10.0"

Step 2:

The next step is to create a pointer to the environment on which this program runs.

val env = StreamExecutionEnvironment.getExecutionEnvironment

Step 3:

Setting parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instance.

I am using 1 as it is a demo application.


Step 4:

Disabling the aws cbor, as we are testing locally.

System.setProperty("com.amazonaws.sdk.disableCbor", "true")
System.setProperty("", "true")

Step 5:

Defining Kinesis consumer properties.

  • Region
  • Stream Position – TRIM_HORIZON to read all the records available in the stream
  • Aws keys
  • Do not worry about the endpoint, it is set to http://localhost:4568 as we will test the kinesis using localstack.

Do not worry about the endpoint, it is set to http://localhost:4568 as we will test the kinesis using localstack.

#apache flink #flink #scala ##apache-flink ##kinesis #apache #flink streaming #kafka #scala

Dedric  Reinger

Dedric Reinger


Basic Anatomy of a Flink Program

Hi Folks! Hope you all are safe in the COVID-19 pandemic and learning new tools and tech while staying at home. I also have just started learning a very prominent Big Data** framework** for stream processing which is  Flink. Flink is a distributed framework and based on the streaming first principle, means it is a real streaming processing engine and implements batch processing as a special case. In this blog, we will see the basic anatomy of a Flink program. So, this blog will help us to understand the basic structure of a Flink program and how we can start writing a basic Flink Application.

Let’s explore the steps involves in setting up the streaming application in Flink with a simple example. In the example, we will read messages in the form of text from the socket text stream. Then filter out the streaming text if it is a number. The Flink application for this use case will be accomplished in 5 steps as shown below.

Step 1: Setup Execution Environment

The very first step is to let Flink knows the right environment for application means whether the streaming application is going to be run locally or on some machines need to connect. So, we need to create a stream execution environment.

StreamExecutionEnvironment executionEnvironment =

#apache flink #big data and fast data #flink #java ##apache flink ##flink #big data #big data analytics #fast data #stream processing #streaming

akshay L

akshay L


Kafka Spark Streaming | Kafka Tutorial

In this kafka spark streaming tutorial you will learn what is apache kafka, architecture of apache kafka & how to setup a kafka cluster, what is spark & it’s features, components of spark and hands on demo on integrating spark streaming with apache kafka and integrating spark flume with apache kafka.

# Kafka Spark Streaming #Kafka Tutorial #Kafka Training #Kafka Course #Intellipaat

Gerhard  Brink

Gerhard Brink


Flink: Join two Data Streams

Reading Time: 3 minutes

Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the** multiple data streams**. Flink provides many multi streams operations like UnionJoin, and so on. In this blog, we will explore the Window Join operator in Flink with an example. It joins two data streams on a given key and a common window.

Let say we have one stream which contains salary information of all the individual who belongs to an organization. The salary information has the id, name, and salary of an individual. This stream is available at port 9000 on the localhost.

#apache flink #big data and fast data #flink #java ##apache flink #big #big data analytics #fast data analytics #flink streaming #joins #streaming #streaming analytics

Gerhard  Brink

Gerhard Brink


Stateful stream processing with Apache Flink(part 1): An introduction

Apache Flink, a 4th generation Big Data processing framework provides robust **stateful stream processing capabilitie**s. So, in a few parts of the blogs, we will learn what is Stateful stream processing. And how we can use Flink to write a stateful streaming application.

What is stateful stream processing?

In general, stateful stream processing is an application design pattern for processing an unbounded stream of events. Stateful stream processing means a** “State”** is shared between events(stream entities). And therefore past events can influence the way the current events are processed.

Let’s try to understand it with a real-world scenario. Suppose we have a system that is responsible for generating a report. It comprising the total number of vehicles passed from a toll Plaza per hour/day. To achieve it, we will save the count of the vehicles passed from the toll plaza within one hour. That count will be used to accumulate it with the further next hour’s count to find the total number of vehicles passed from toll Plaza within 24 hours. Here we are saving or storing a count and it is nothing but the “State” of the application.

Might be it seems very simple, but in a distributed system it is very hard to achieve stateful stream processing. Stateful stream processing is much more difficult to scale up because we need different workers to share the state. Flink does provide ease of use, high efficiency, and high reliability for the**_ state management_** in a distributed environment.

#apache flink #big data and fast data #flink #streaming #streaming solutions ##apache flink #big data analytics #fast data analytics #flink streaming #stateful streaming #streaming analytics