Comparison of Open Source Apache Kafka vs Vendors including Confluent, Cloudera, Red Hat, Amazon MSK. Let’s see how big Kafka really is.
Apache Kafka became the de facto standard for event streaming. The open-source community is huge. Various vendors added Kafka and related tooling to their offerings or provide a Kafka cloud service. This blog post uses the car analogy - from the motor engine to the self-driving car - to explore the different Kafka offerings available on the market. I also cover a few other vehicles, meaning (partly) Kafka-compatible technologies. The goal is not a feature-by-feature comparison (that would be outdated the day after the publication). Instead, the intention is to educate about the different deployment models, product strategies, and trade-offs from the available options.
Disclaimer: I work for Confluent. However, the post is not about comparing features but explaining the concepts behind the alternatives. I talk to enterprises across the globe every week. I can assure you that many people I talk to are not aware or mislead about what you read in the following sections. Hence, I hope that the following helps you to make the right decision. Either choose to run open-source Apache Kafka or one of the various commercial Kafka offerings, or even a combination of both.
The goal is not to introduce Kafka here. The minimum you should know is that Kafka is NOT just a messaging layer for data ingestion into a data lake. This is just a fraction of today’s usages.
Kafka is an open-source framework under Apache 2.0 license. It provides a combination of messaging, storage, processing, and integration of high volumes of data at scale in real-time and fault-tolerant. That’s what makes Kafka unique compared to other MQ, ETL, ESB, and API platforms.
Kafka is deployed in production for various use cases across industries. This includes analytical and mission-critical workloads. Different deployments require different SLAs. You should always ask yourself what happens if the Kafka infrastructure is in trouble. What are your RTO (Recovery Time Objective) and RPO (Recovery Point Objective)? Or in other words: How much data is okay to lose? How much downtime is acceptable? Start your Kafka projects with these questions in mind when you start your comparison of the options!
#aws #big-data #cloud #kafka #apache-kafka