Car buying is fun with Snowflake Part 3

Car buying is fun with Snowflake Part 3

In part 2, entire pipeline to ingest data into Snowflake was automated using Azure Logic App and SnowPipe. JSON Data was loaded in Snowflake landing table with a single 1 column called JSON_DATA.

In part 2, entire pipeline to ingest data into Snowflake was automated using Azure Logic App and SnowPipe. JSON Data was loaded in Snowflake landing table with a single 1 column called JSON_DATA.

Ideally, there should also be a datetime column that will contain the date and time of when data was loaded into landing table. However, due to a limitation in SnowPipe it will not allow any additional column when using JSON format. If you try, you will get following error.

Snowflake’s Streams and Tasks feature can be leveraged to move this data into a 2nd landing table with additional columns such as load_dttm (load date time).

Snowflake Stream help with CDC (change data capture). It sort of works like a Kafka topic and will contain 1 row per changes in it base table. In this case VMS_Azure_Blob_LZ1 (landing zone 1)

//Create a stream on VMS_Azure_Blob_LZ1 table
CREATE OR REPLACE STREAM VMS_AZURE_BLOB_LZ1_STREAM ON TABLE “VMS”.”PUBLIC”.”VMS_AZURE_BLOB_LZ1";

//Verify using
SHOW STREAMS;
//Verify that stream works by invoking REST API to load some same data in LZ1 and then run a Select on stream
SELECT * FROM VMS_AZURE_BLOB_LZ1_STREAM;

Next step is to insert data present in stream to Landing Zone 2 table. It will be a simple SQL insert like this

//Create a 2nd Landing table (Seq is used to generate auto incremented ids)

create or replace TABLE VMS_AZURE_BLOB_LZ2 (
 SEQ_ID NUMBER(38,0) NOT NULL DEFAULT VMS.PUBLIC.VMS_AZURE_BLOB_LZ2_SEQ.NEXTVAL,
 LOAD_DTTM TIMESTAMP_NTZ(9) NOT NULL DEFAULT CURRENT_TIMESTAMP(),
 JSON_DATA VARIANT NOT NULL
)COMMENT='Data will be inserted from stream and task'
;
//Test and verify that select from Stream works as intended before using it in Task
INSERT INTO "VMS"."PUBLIC"."VMS_AZURE_BLOB_LZ2" (JSON_DATA) (
  SELECT 
    JSON_DATA
  FROM
    VMS_AZURE_BLOB_LZ1_STREAM
  WHERE
    "METADATA$ACTION" = 'INSERT'

Remember the whole idea is to automate, so Inserts needs to run automatically. It can be done so using Snowflake Task. It basically works as a Task Scheduler using “cron” time format. In this case it is set as 30 20 * * * to run at 8:30 PM PT after all the dealerships close.

CREATE OR REPLACE TASK VMS_AZURE_BLOB_MOVE_LZ1_TO_LZ2_TASK
  WAREHOUSE = TASKS_WH //A specific WH created to be used for Tasks only to show up on bill as separate line item
  SCHEDULE  = 'USING CRON 30 20 * * * America/Vancouver' //Process new records every night at 20:30HRS
WHEN
  SYSTEM$STREAM_HAS_DATA('VMS_AZURE_BLOB_LZ1_STREAM')
AS
  INSERT INTO "VMS"."PUBLIC"."VMS_AZURE_BLOB_LZ2" (JSON_DATA) (
  SELECT 
    JSON_DATA
  FROM
    VMS_AZURE_BLOB_LZ1_STREAM
  WHERE
    "METADATA$ACTION" = 'INSERT'
);

Notice that in above DDL, warehouse specified is TASKS_WH, it was created specifically to run Tasks and in the monthly billing that Snowflake generates it will come as a separate line time. This way it is easier to monitor and track costs of Tasks by aligning them together so that compute can do few of them together, resulting into cost savings.

snowflake blob tasks azure snowflake

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Use Azure Logic Apps to import data into Azure SQL Database from Azure Blob Storage

This article will show Azure Automation for Import data into Azure SQL Database from Azure Blob Storage container using Azure Logic Apps.

Enable / Disable Automated Access Tier Change Rules in Azure Blob Storage-Daily .NET Tips

In this post, we will learn how to Enable / Disable Automated Access Tier Change Rules in Azure Blob Storage Azure Storage supports three different levels of storage tiers for Blob objects. They are Hot storage tier, Cool storage tier, and Archive storage tier. This enables cost-effective usages of blob data storage. With the different levels of storage tiers, you can choose what types of tier needs for your data. In the previous post, we have learned how we can Automatically update Access Tier in Azure Blob Storage

How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server

In this article, you learn how to set up Azure Data Sync services. In addition, you will also learn how to create and set up a data sync group between Azure SQL database and on-premises SQL Server.

Azure Automation: Export Azure SQL Database to Blob Storage in a BACPAC file

This article will show how to export Azure SQL Database to blob storage in a BACPAC file.

Automatically Updates Access Tier in Azure Blob Storage

In this post let us quickly take a look at how to automatically updates access tier in Azure Blob Storage.