Apache Kafka is a stream-processing software platform originally developed by LinkedIn, open sourced in early 2011 and currently developed by the Apache Software Foundation. It is written in Scala and Java.
Intro to Streams by Confluent
Kafka is a distributed system that consists of servers and clients.
Servers and clients communicate via a high-performance TCP network protocol and are fully decoupled and agnostic of each other.
But what is an event? In Kafka, an event is an object that has a key, a value and a timestamp. Optionally, it could have other metadata headers. You can think an event as a record or a message.
One or more events are organized in topics: producers can write messages/events on different topics and consumers can choose to read and process events of one or more topics. In Kafka, you can configure how long events of a topic should be retained, therefore, they can be read whenever needed and are not deleted after consumption.
A consumer cosumes the stream of events of a topic at its own pace and can commit its position (called offset). When we commit the offset we set a pointer to the last record that the consumer has consumed.
#kafka #python #streaming #big-data #docker