Data Warehouse Quality Matters

Data Warehouse Quality Matters

There are a few simple data quality checks you can build in your Data Warehouse process to detect data inconsistencies due to errors within your ETL/ELT pipelines or connection failures.

Walkthrough Example with Google’s Big Query

There are a few simple data quality checks you can build in your Data Warehouse process to detect data inconsistencies due to errors within your ETL/ELT pipelines or connection failures. It’s always a good idea to measure data quality and to fastly detect errors, otherwise a lack of trust from the consumer can be the result. The following article will show you some easy to adapt data consistency checks between your source and target system — for example a Data Warehouse or a Data Lake.

Image for post

Data quality measurement process

Check Tables via Count Rows

A very simple check to detect deviations between source and target systems is to count rows of the tables in both systems. This check of course will only make sense, if you have an equal table structure in your source and target system, which is realized via an ELT process.

Example count rows in Big Query:

SELECT count(1) FROM `double-silo-282323.Test_Data.table_1` ;
SELECT count(1) FROM `double-silo-282323.Test_Data.table_2`

You can do this manually in both systems, but an automatic approach would be a greater idea — for example by taking the count result from the source system with your ETL tool to your Data Warehouse/Lake and creating reports like in the figure shown below:

Image for post

Example Report

You can also improve it by trigger alarms e.g. with a cloud function and sending them via E-Mail to your data engineers.

Check Table Content via Business Key Figures

To check if the content of the tables is identical you will need other technics than counting the row numbers. One example could be the usage of key figures. You can sum up certain rows like price or amount and check if the sum of source and target system table is the same. The challenge here is to find good key figures, however, you can realize this check without much computing power, but with just a simple SUM() function in SQL.

data-processing data-warehouse data-quality google-cloud-platform bigquery data analysis

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

BigQuery : Petabyte Scale Data warehouse In GCP

In GCP , BigQuery is serverless way of doing petabyte scale analytics. This blog explains about BigQuery data warehouse solution on GCP.

Overview of Google Cloud Essentials Quest

If you looking to learn about Google Cloud in depth or in general with or without any prior knowledge in cloud computing, then you should definitely check this quest out.

Big Data in Google Cloud — Cost Monitoring

This is part two of three. On our last post, we discussed mainly the use of guidelines on how to use BigQuery (BQ) consciously in terms of costs.

How to process and visualize financial data on Google Cloud

This article is meant to act as a continuation, or ”part 2", to a previous article in which I showed How to automate financial data collection with Python using APIs and Google Cloud.

Multi-cloud Spending: 8 Tips To Lower Cost

Mismanagement of multi-cloud expense costs an arm and leg to business and its management has become a major pain point. Here we break down some crucial tips to take some of the management challenges off your plate and help you optimize your cloud spend.