Bigquery is a fantastic tool! It lets you do really powerful analytics works all using SQL like syntax.

But it lacks chaining the SQL queries. We cannot run one SQL right after the completion of another. There are many real-life applications where the output of one query depends upon for the execution of another. And we would want to run multiple queries to achieve the results.

Here is one scenario, suppose you are doing RFM analysis using BigQuery ML. Where first you have to calculate the RFM values for all the users then apply the k-means cluster to the result of the first query and then merge the output of the first query and second query to generate the final data table.

In the above scenario, every next query depends upon the output of the previous query and the output of each query also needs to be stored in data for other uses.

I this guide I will show how to execute as many SQL queries as you want in BigQuery one after another creating a chaining effect to gain the desire results.


Methods

I will demonstrate two approaches to chaining the queries

  1. The First using cloud pub/sub and cloud function: This is a more sophisticated method as it ensures that the current query is finished executing before executing the next one. This method also required a bit of programming experience so better to reach out to someone with a technical background in your company.
  2. The second using BigQuery’s own scheduler: However, the query scheduler cannot ensure the execution of one query is complete before the next is triggered so we will have to hack it using query execution time. More on this later.

And If you want to get your hands dirty yourself then here is an excellent course to start with.

Note: We will continue with the RFM example discussed above to get you the idea of the process. But the same can be applied for any possible scenario where triggering multiple SQL queries is needed.


Method 1

Method 1 uses the combination of cloud functions and pub/subs to chain the entire flow. The process starts by query scheduler which after executing the first query sends a message to pub/sub topic that triggers a cloud function responsible to trigger 2nd query and once completed sends a message to another pub/sub topic to start yet another cloud function. The process continues until the last query is executed by the cloud function.

Let’s understand the process with our RFM analysis use case.

Suppose we have three queries that are needed to be run one after another to perform RFM analysis.

First, that calculates RFM values, we will call it _RFM Values_.

Second, that creates the model, we will call it_RFM Model_.

Third_, that merges model output with users RFM values, we will call it __RFM Final_.

#sql #bigquery #query #gcp #cloud-functions #function

Chaining Multiple SQL Queries in BigQuery
14.15 GEEK