Spark Integration With kafka(Batch)

In this article we will discuss about the integration of spark(2.4.x) with kafka for batch processing of queries.
Kafka:-
Kafka is a distributed publisher/subscriber messaging system that acts as a pipeline for transfer of real time data in fault-tolerant and parallel manner. Kafka helps in building real-time streaming data pipelines that reliably gets data between systems or applications. This data can be ingested and processed either continuously (spark structured streaming) or in batches. In this article we will discuss ingestion of data from kafka for batch processing using spark. We will discuss interaction of spark with kafka and the spark APIs used for reading as well as writing of data.

#big-data #java #spark #kafka

medium.com

Spark Integration With kafka(Batch)