Walkthrough Example with Google’s Big Query

There are a few simple data quality checks you can build in your Data Warehouse process to detect data inconsistencies due to errors within your ETL/ELT pipelines or connection failures. It’s always a good idea to measure data quality and to fastly detect errors, otherwise a lack of trust from the consumer can be the result. The following article will show you some easy to adapt data consistency checks between your source and target system — for example a Data Warehouse or a Data Lake.

Image for post

Data quality measurement process

Check Tables via Count Rows

A very simple check to detect deviations between source and target systems is to count rows of the tables in both systems. This check of course will only make sense, if you have an equal table structure in your source and target system, which is realized via an ELT process.

Example count rows in Big Query:

SELECT count(1) FROM `double-silo-282323.Test_Data.table_1` ;
SELECT count(1) FROM `double-silo-282323.Test_Data.table_2`

You can do this manually in both systems, but an automatic approach would be a greater idea — for example by taking the count result from the source system with your ETL tool to your Data Warehouse/Lake and creating reports like in the figure shown below:

Image for post

Example Report

You can also improve it by trigger alarms e.g. with a cloud function and sending them via E-Mail to your data engineers.

Check Table Content via Business Key Figures

To check if the content of the tables is identical you will need other technics than counting the row numbers. One example could be the usage of key figures. You can sum up certain rows like price or amount and check if the sum of source and target system table is the same. The challenge here is to find good key figures, however, you can realize this check without much computing power, but with just a simple SUM() function in SQL.

#data-processing #data-warehouse #data-quality #google-cloud-platform #bigquery #data analysis

Data Warehouse Quality Matters
1.45 GEEK