Hive Metastore supports various backend databases, among which MySQL is the most commonly used. However, in real-world scenarios, MySQL’s shortcoming is obvious: as metadata grows in Hive, MySQL is limited by its standalone performance and can’t deliver good performance. When individual MySQL databases form a cluster, the complexity drastically increases. In scenarios with huge amounts of metadata (for example, a single table has more than 10 million or even 100 million rows of data), MySQL is not a good choice.
We had this problem, and our migration story proves that TiDB, an open-source distributed Hybrid Transactional/Analytical Processing (HTAP) database, is a perfect solution in these scenarios.
In this post, I’ll share with you how to create a Hive cluster with TiDB as the Metastore database at the backend so that you can use TiDB to horizontally scale Hive Metastore without worrying about database capacity.
TiDB is a distributed SQL database built by PingCAP and its open-source community. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. It’s a one-stop solution for both Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) workloads.
In scenarios with enormous amounts of data, due to TiDB’s distributed architecture, query performance is not limited to the capability of a single machine. When the data volume reaches the bottleneck, you can add nodes to improve TiDB’s storage capacity.
Because TiDB is compatible with the MySQL protocol, it’s easy to switch Hive’s Metastore database to TiDB. You can use TiDB as if you were using MySQL, with almost no changes:
mysqldump
tool to replicate all data in MySQL to TiDB.Creating a Hive cluster with TiDB involves the following steps:
#database #tutorial #mysql #hive #mysql database #scale out #hive cluster