If your business is your Solar System, then your Data is the SUN, it has both gravity & mass, everything revolves around it, it must live forever — Myself

Introduction

Kafka is one of the most popular messaging systems out there, used for real-time streams of data, to collect big data, or to do real-time analysis or both. Kafka is used to stream data into data lakes, applications, and real-time stream analytics systems.

Image for post

Apache Kafka Use Cases

Kafka: Your central source for Data Lake

Kafka is a swiss-army knife for building a scalable and fault-tolerant architecture. While you can use Kafka in a variety of use-cases, this post will details, how you can leverage Kafka as your main source for your Data Lake.
Image for post

The high-level architecture shown above is simple, the data you care about for long term storage, whether its business decisions related, analytics, or compliance-related or just you never want to delete that, you could choose to dump your from Kafka on to a Shared S3 compliant Object Storage (like Ceph). S3 and S3A interfaces are pervasive and almost all your favorite tools support these protocols. You would never feel yourself locked-in.

For a greenfield deployment, if you are building your data infrastructure on the public cloud provides like AWS, GCP, Azure, and Oracle Cloud. You have object storage service from the respective providers that you could leverage to build this architecture.

For on-premise green/brownfield deployments, you need to be meticulous while choosing components for your architecture. If your business is your Solar System, then your Data is the SUN, it has both gravity & mass. Everything revolves around it, It must live forever.

Public cloud gives you agility at a low cost. With services like Snow Ball / Snow Mobile, it’s not super difficult to move the data that you have accumulated over the years to the public cloud object storage services. The most ignored thought is “how to bring back home, your data from the public cloud services”. In any of the product and services announcements from cloud majors, have you ever heard about any one of them launching a service that can help you move your data stored in any of their services, at scale, at a reasonable cost to your on-premise location?

#openshift #kubernetes #kafka #secor #object-storage #data analysis

Enrich your Ceph Object Storage Data Lake
4.15 GEEK