It is said that Kafka is a messaging system. Many people think of it as a message bus that moves messages from one place to another. Surely it is true but in reality it is more than that.

Kafka is an open source, distributed event streaming platform, which can handle all of the data and all of the events within the entire system. It is a platform on which real time applications can be built and it can integrate different systems together driven by the power of events.

It streams records in a fault tolerant, durable way, providing backpressure, integration points and decoupling source system from sink system. These features combined with flexible architecture make Kafka a powerful tool that guarantees high performance and low latency.

Kafka is built based on publish-subscribe pattern which enables an application to announce messages to multiple interested consumers asynchronously, without coupling the senders to the receivers. Kafka is run as a cluster on one or multiple servers, which made easy to meet the guarantees that are given by the creators of Kafka.

This (kind-of) formal/general concept of Kafka architecture might seem overwhelming. Therefore, the core concept of kafka has been introduced in the next chapter.

Core Concept

In the “old” days information was stored in the databases. This approach made developers to understand programming in terms of things (i.e ticket, truck, camera). Each single thing had its state that had been stored in the DB.

Lately, some people has proposed to understand programming in terms of events. Each event has its state (like a “thing” stored in DB), but also has a description of what happened and indication in time that the “thing” took place.

#big-data #distributed-systems #kafka #programming #streaming

Setting the Scene for Apache Kafka
1.60 GEEK