I have introduced a concept of cloud native data platform without hadoop installation in the previous blog post https://medium.com/@mykidong/cloud-native-data-platform-without-hadoop-installation-6254a8ea3473 .

With that post, we got some idea how to build a data platform without hadoop on Kubernetes. The main components of Hadoop can be replaced with alternatives, for example, HDFS can be replaced with S3 compatible object storages like Ceph, MinIO and Ozone which can be run on Kubernetes, and YARN can be replaced with Kubernetes if Spark is the main computing execution engine. All other data platform components like presto, hive on spark, kafka, etc can be run on Kubernetes.

Here, I am going to extend that concept and talk about building private cloud platform based on Kubernetes.

#hadoop #delta-lake #spark #minio #kubernetes

A Concept: Kubernetes based Private Cloud Platform
6.50 GEEK