I have introduced a concept of cloud native data platform without hadoop installation in the previous blog post. With that post, we got some idea how to build a data platform without hadoop on Kubernetes.
I have introduced a concept of cloud native data platform without hadoop installation in the previous blog post https://medium.com/@mykidong/cloud-native-data-platform-without-hadoop-installation-6254a8ea3473 .
With that post, we got some idea how to build a data platform without hadoop on Kubernetes. The main components of Hadoop can be replaced with alternatives, for example, HDFS can be replaced with S3 compatible object storages like Ceph, MinIO and Ozone which can be run on Kubernetes, and YARN can be replaced with Kubernetes if Spark is the main computing execution engine. All other data platform components like presto, hive on spark, kafka, etc can be run on Kubernetes.
Here, I am going to extend that concept and talk about building private cloud platform based on Kubernetes.
Our original Kubernetes tool list was so popular that we've curated another great list of tools to help you improve your functionality with the platform.
🔥Intellipaat Big Data Hadoop Course: https://intellipaat.com/big-data-hadoop-training/ In this video on Hadoop vs Spark you will understand about the top Big...
Delta Lake is an open-source storage layer that delivers reliability to data lakes. Delta Lake implements ACID transactions, scalable metadata handling, and unifies the streaming and batch data processing. Delta Lake architecture runs on top of your current data lake and is fully cooperative with Apache Spark APIs.
Get to know the storage layer which enabled ACID and updates with Spark. Let me start by introducing two problems that I have dealt time and again with my experience with Apache Spark.
Apache Hadoop & Spark Tutorial For Beginners will help you understand the basics of Hadoop and Spark with examples.