While moving the Hadoop workload from an on-premise CDH cluster to Azure, we also had a task to move the existing on-premise Hive metastore. This article provides two of the best practices for Hive Metadata migration from on-premise to Azure HDInsight.
Set up database replication between the on-premises Hive metastore DB and HDInsight Hive metastore DB. The ollowing command can be used to setup the replication between the two instances:
./hive --service metatool -updateLocation hdfs://<namenode>:8020/ wasb://<container_name>@<storage_account_name>.blob.core.windows.net/
The above ‘hive metatool’ will replicate the hive metastore data from the given HDFS to the target WASB/ADLS/ABFS
Recommendation: This approach is recommended when either the source and target metadata DB are identical, or, when you are setting up or migrating existing applications.
SQL
bash hive_table_dd.sh metastoreDB
WASB/ADLS/ABFS
URLs.SQL
Ensure that the Hive metastore version is compatible between on-premises and Azure HDInsight Hive instance.
Recommendation: This approach is recommended when either the source and target metadata DB are not identical, or when you are trying to set up a new environment.
Validation: In order to validate that the Hive metastore has been migrated completely, run bash script in step 1 on both the metastore DBs (i.e. source and target) to print all the Hive tables and their data locations.
Compare the outputs generated from the on-premise and Azure HDI to verify that no tables are missing in the new metastore DB.
#azure #migration #hive #metastore