Create a Scale-Out Hive Cluster With a Distributed, MySQL-Compatible Database

Hive Metastore supports various backend databases, among which MySQL is the most commonly used. However, in real-world scenarios, MySQL’s shortcoming is obvious: as metadata grows in Hive, MySQL is limited by its standalone performance and can’t deliver good performance. When individual MySQL databases form a cluster, the complexity drastically increases. In scenarios with huge amounts of metadata (for example, a single table has more than 10 million or even 100 million rows of data), MySQL is not a good choice.

We had this problem, and our migration story proves that TiDB, an open-source distributed Hybrid Transactional/Analytical Processing (HTAP) database, is a perfect solution in these scenarios.

In this post, I’ll share with you how to create a Hive cluster with TiDB as the Metastore database at the backend so that you can use TiDB to horizontally scale Hive Metastore without worrying about database capacity.

Why Use TiDB in Hive as the Metastore Database?

TiDB is a distributed SQL database built by PingCAP and its open-source community. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. It’s a one-stop solution for both Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) workloads.

In scenarios with enormous amounts of data, due to TiDB’s distributed architecture, query performance is not limited to the capability of a single machine. When the data volume reaches the bottleneck, you can add nodes to improve TiDB’s storage capacity.

Because TiDB is compatible with the MySQL protocol, it’s easy to switch Hive’s Metastore database to TiDB. You can use TiDB as if you were using MySQL, with almost no changes:

For the existing Hive cluster, you can use the mysqldump tool to replicate all data in MySQL to TiDB.
You can use the metadata initialization tool that comes with Hive to create a new Hive cluster.

How to Create a Hive Cluster With TiDB

Creating a Hive cluster with TiDB involves the following steps:

Meet component requirements
Install a Hive cluster
Deploy a TiDB cluster
Configure Hive
Initialize metadata
Launch Metastore and test

#database #tutorial #mysql #hive #mysql database #scale out #hive cluster

Why Use TiDB in Hive as the Metastore Database?

How to Create a Hive Cluster With TiDB

dzone.com

Create a Scale-Out Hive Cluster With a Distributed, MySQL-Compatible Database