In GCP , BigQuery is serverless way of doing petabyte scale analytics. This blog explains about BigQuery data warehouse solution on GCP.

Image for post

Introduction

BigQuery is a data warehouse that is built for the cloud. Its google proprietary data warehouse solution on Google Cloud Platform.

BigQuery is Serverless that means as a customer we don’t have to configure/manage any servers & storage.It will be done behind the scene by google. as a customer, our job is to upload the data and query that means which just focus on business rather than thinking about infrastructure.

BigQuery is not a transactional database like Mysql or Oracle. BigQuery is designed for analytical workloads.

For Example, Query like below is called an analytical query because its purpose is to analyze the data and provide some calculative results like count, max, min, avg, etc.

Here we trying to find titles and total_views for each Wikipedia page.

SELECT title,

count(views) as total_views
FROM
`bigquery-public-data.wikipedia.pageviews_2020`
WHERE
DATE(datehour) = “2020–04–18”
GROUP BY
title
ORDER BY
total_views
DESC;

Analytical queries are very useful in reporting and business intelligence because it provides insights from data based on which Business side can make the tactical decision for the company.

Architecture

Being Serverless we actually don’t need to know about underlying architecture but in knowing it would be helpful for us to optimize our query, cost & performance in some scenarios.

BigQuery is built on top of Google Dremel technology which is used inside google since 2006 in many services in production. (Please refer reference section for the paper)

Dremel is google’s interactive ad-hoc query system which is designed to query read-only data. BigQuery uses Dremel for its execution engine.

Apart from Dremel BigQuery uses Google’s innovative tech like Borg, Colossus File Syste, Jupyter network, and Capacitor.

#introduction-to-bigquery #bigquery-for-beginners #gcp-data-warehousing #data-warehouse #google-bigquery #data analysis

BigQuery : Petabyte Scale Data warehouse In GCP
2.60 GEEK