Hive on Spark in Kubernetes

Hive on Spark in Kubernetes

There is an alternative to run Hive on Kubernetes. Spark can be run on Kubernetes, and Spark Thrift Server compatible with Hive Server2 is a great candidate. That is, Spark will be run as hive execution engine. I am going to talk about how to run Hive on Spark in kubernetes cluster .

It is not easy to run Hive on Kubernetes. As long as I know, Tez which is a hive execution engine can be run just on YARN, not Kubernetes.

There is an alternative to run Hive on Kubernetes. Spark can be run on Kubernetes, and Spark Thrift Server compatible with Hive Server2 is a great candidate. That is, Spark will be run as hive execution engine.

I am going to talk about how to run Hive on Spark in kubernetes cluster .

All the codes mentioned here can be cloned from my github repo: https://github.com/mykidong/hive-on-spark-in-kubernetes

Assumed that S3 Bucket and NFS as Kubernetes Storage are available

Before running Hive on Kubernetes, your S3 Bucket and NFS as kubernetes storage should be available for your kubernetes cluster.

Your S3 bucket will be used to store the uploaded spark dependency jars, hive tables data, etc.

NFS Storage will be used to support PVC ReadWriteMany Access Mode which is needed to spark job.

If you have no such S3 bucket and NFS available, you can install them on your kubernetes cluster manually like me:

hive kubernetes spark s3

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

50+ Useful Kubernetes Tools for 2020 - Part 2

Our original Kubernetes tool list was so popular that we've curated another great list of tools to help you improve your functionality with the platform.

How to Run Spark with Kubernetes

Use Spark in a simple and portable way on-promise and in the cloud. In this blog, I will explain how to run Spark with Kubernetes using the Spark on Kubernetes Operator. I will also describe the configurations for fast S3 data access using S3A Connector and S3A Committers. This architecture works for both cloud object storage and on premise S3 compatible object storage like FlashBlade S3.

Kubernetes in the Cloud: Strategies for Effective Multi Cloud Implementations

This article explains how you can leverage Kubernetes to reduce multi cloud complexities and improve stability, scalability, and velocity.

How to guide: Set up, Manage & Monitor Spark on Kubernetes (with code examples)

Earlier this year at Spark + AI Summit, we had the pleasure of presenting our session on the best practices and pitfalls of running Apache Spark on Kubernetes (K8s).

Kubernetes vs Docker

Get Hands-on experience on Kubernetes and the best comparison of Kubernetes over the DevOps at your place at Kubernetes training