How to install Kafka using Docker and produce/consume messages in Python

Apache Kafka is a stream-processing software platform originally developed by LinkedIn, open sourced in early 2011 and currently developed by the Apache Software Foundation. It is written in Scala and Java.

Intro to Streams by Confluent

Key Concepts of Kafka

Kafka is a distributed system that consists of servers and clients.

  • Some servers are called brokers and they form the storage layer. Other servers run Kafka Connect to import and export data as event streams to integrate Kafka with your existing system continuously.
  • On the other hand, clients allow you to create applications that read, write and process streams of events. A client could be a producer or a consumer. A producer writes (produces) events to Kafka while a consumer read and process (consumes) events from Kafka.

Servers and clients communicate via a high-performance TCP network protocol and are fully decoupled and agnostic of each other.

But what is an event? In Kafka, an event is an object that has a key, a value and a timestamp. Optionally, it could have other metadata headers. You can think an event as a record or a message.

One or more events are organized in topics: producers can write messages/events on different topics and consumers can choose to read and process events of one or more topics. In Kafka, you can configure how long events of a topic should be retained, therefore, they can be read whenever needed and are not deleted after consumption.

A consumer cosumes the stream of events of a topic at its own pace and can commit its position (called offset). When we commit the offset we set a pointer to the last record that the consumer has consumed.

#kafka #python #streaming #big-data #docker

Apache Kafka: Docker Container and examples in Python
7.40 GEEK