Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline.

Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline.

Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.

Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data like a messaging system.

Apache Cassandra is a distributed and wide-column NoSQL data store.

Minimum Requirements and Installations

To start the application, we’ll need Kafka, Spark and Cassandra installed locally on our machine. The minimum requirements for the application:

Java 1.8+, Scala 2.12.10, SBT 1.3.8, spark 2.4.0 , Kafka 2.3.0 , Cassandra 3.10


data from Kafka topic we will get Dataset[Car] as a result. We can apply s

Connecting to Kafka and reading streams.

      .option("kafka.bootstrap.servers", "localhost:9092")
      .option("subscribe", "kafkaToCassandra")
      .option("startingOffsets", "earliest")
      .selectExpr("cast(value as string) as value")
      .select(from_json(col("value"), carSchema).as[Car])

In the above code snippet, reading JSON data from Kafka Topic “kafkaToCassandra” which contain information of Cars. The Car Model looks like below:

 case class Car(
                                Name: String,
                                Miles_per_Gallon: Option[Double],
                                Cylinders: Option[Long],
                                Displacement: Option[Double],
                                Horsepower: Option[Long],
                                Weight_in_lbs: Option[Long],
                                Acceleration: Option[Double],
                                Year: String,
                                Origin: String

apache kafka apache spark big data and fast data cassandra messagesapi scala spark streaming data analysis datastream api

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Wondering how to upgrade your skills in the pandemic? Here's a simple way you can do it.

Corona Virus Pandemic has brought the world to a standstill. Countries are on a major lockdown. Schools, colleges, theatres, gym, clubs, and all other public

Spark Structured Streaming – Stateful Streaming

Spark Structured Streaming – Stateful Streaming. Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog "Internals of Structured Streaming".

Role of Big Data in Healthcare - DZone Big Data

In this article, see the role of big data in healthcare and look at the new healthcare dynamics. Big Data is creating a revolution in healthcare, providing better outcomes while eliminating fraud and abuse, which contributes to a large percentage of healthcare costs.

Silly mistakes that can cost ‘Big’ in Big Data Analytics

‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought

Big Data can be The ‘Big’ boon for The Modern Age Businesses

We need no rocket science in understanding that every business, irrespective of their size in the modern-day business world, needs data insights for its expansion. Big data analytics is essential when it comes to understanding the needs and wants of a significant section of the audience.