Alisha  Larkin

Alisha Larkin

1620876720

Creating external tables in Azure Synapse Analytics

In this article, we will learn to create external tables in Azure Synapse Analytics with dedicated SQL pools.

Introduction

In the previous parts of the Azure Synapse Analytics article series, we learned how to use SQL Server Integration Services (SSIS) to populate data. At times, one may need to access data in-place without the need to copy the entire dataset to Azure Synapse. Typically, the Azure Data Lake Storage account is used to host a large volume of data files. Accessing data from data files stored in Azure Data Lake Storage without the need to physically create a copy of this data in the Azure Synapse Analytics dedicated SQL pool on the local storage can provide fast and ad-hoc data access to data that is hosted outside the bounds of Azure Synapse Analytics. Let’s go ahead to understand the creation of external tables in an Azure Synapse Analytics with dedicated SQL pools.

Pre-requisites

Azure Synapse Analytics offers two types of SQL pools – SQL on-demand pool and dedicated SQL Pool. SQL on-demand pools do not have any local storage at all, so the only option is to access data from different sources in-place. In the case of dedicated SQL pools, it offers a distributed parallel-processing engine with the option to store massive data volumes locally as well. This local data may need to reference data stored externally i.e. outside Azure Synapse Analytics. This is the exact use-case we are going to address in this article. For this, we would need an Azure Synapse Analytics workspace and a dedicated SQL pool in place, as covered in the previous part of this article series.

When we create a Synapse workspace account, by default it creates an Azure Data Lake Storage Gen2 account. We would need some sample data on this storage account, as it would act as the external data which we will attempt to access from the SQL Pool instance. In this case, we have the sample data available in the Azure SQL database exported in CSV format and stored in text files on the Azure Data Lake Storage account. Open Synapse Studio from the Synapse workspace account. In the Data section under the linked tab, one can explore the files stored in the Azure Data Lake Storage account. Right-click on the file and select Preview to explore the data in the file as shown below. This data file named SalesLTCustomers.txt is the file that we intend to access from the SQL pool instance.

#azure #sql azure #creating #azure synapse analytics

What is GEEK

Buddha Community

Creating external tables in Azure Synapse Analytics
Alisha  Larkin

Alisha Larkin

1620876720

Creating external tables in Azure Synapse Analytics

In this article, we will learn to create external tables in Azure Synapse Analytics with dedicated SQL pools.

Introduction

In the previous parts of the Azure Synapse Analytics article series, we learned how to use SQL Server Integration Services (SSIS) to populate data. At times, one may need to access data in-place without the need to copy the entire dataset to Azure Synapse. Typically, the Azure Data Lake Storage account is used to host a large volume of data files. Accessing data from data files stored in Azure Data Lake Storage without the need to physically create a copy of this data in the Azure Synapse Analytics dedicated SQL pool on the local storage can provide fast and ad-hoc data access to data that is hosted outside the bounds of Azure Synapse Analytics. Let’s go ahead to understand the creation of external tables in an Azure Synapse Analytics with dedicated SQL pools.

Pre-requisites

Azure Synapse Analytics offers two types of SQL pools – SQL on-demand pool and dedicated SQL Pool. SQL on-demand pools do not have any local storage at all, so the only option is to access data from different sources in-place. In the case of dedicated SQL pools, it offers a distributed parallel-processing engine with the option to store massive data volumes locally as well. This local data may need to reference data stored externally i.e. outside Azure Synapse Analytics. This is the exact use-case we are going to address in this article. For this, we would need an Azure Synapse Analytics workspace and a dedicated SQL pool in place, as covered in the previous part of this article series.

When we create a Synapse workspace account, by default it creates an Azure Data Lake Storage Gen2 account. We would need some sample data on this storage account, as it would act as the external data which we will attempt to access from the SQL Pool instance. In this case, we have the sample data available in the Azure SQL database exported in CSV format and stored in text files on the Azure Data Lake Storage account. Open Synapse Studio from the Synapse workspace account. In the Data section under the linked tab, one can explore the files stored in the Azure Data Lake Storage account. Right-click on the file and select Preview to explore the data in the file as shown below. This data file named SalesLTCustomers.txt is the file that we intend to access from the SQL pool instance.

#azure #sql azure #creating #azure synapse analytics

Christa  Stehr

Christa Stehr

1603941420

Support for Synapse SQL serverless in Azure Synapse Link for Azure Cosmos DB

Co-authored by Rodrigo Souza, Ramnandan Krishnamurthy, Anitha Adusumilli and Jovan Popovic (Azure Cosmos DB and Azure Synapse Analytics teams)

Azure Synapse Link now supports querying Azure Cosmos DB data using Synapse SQL serverless. This capability, available in public preview, allows you to use familiar analytical T-SQL queries and build powerful near real-time BI dashboards on Azure Cosmos DB data.

As announced at Ignite 2020, you can now also query Azure Cosmos DB API for Mongo DB data using Azure Synapse Link, enabling analytics with Synapse Spark and Synapse SQL serverless.

Support for T-SQL queries and building near real-time BI dashboards

Azure Synapse SQL serverless (previously known as SQL on-demand) is a serverless, distributed data processing service offering built-in query execution fault-tolerance and a consumption-based pricing model. It enables you to analyze your data in Cosmos DB analytical store within seconds, without any performance or RU impact on your transactional workloads.

Using OPENROWSET syntax and automatic schema inference, data and business analysts can use familiar T-SQL query language to quickly explore and reason about the contents in Azure Cosmos DB analytical store. You can query this data in place without the need to copy or load the data into a specialized store.

You can also create SQL views to join data in the analytical stores across multiple Azure Cosmos DB containers, to better organize your data in a semantic layer that will accelerate your data exploration and reporting workloads. BI Professionals can quickly create Power BI reports on top of these SQL views in Direct Query mode.

You can further extend this by building a logical data warehouse to create and analyze unified views of data across Azure Cosmos DB, Azure Data Lake Storage and Azure Blob Storage.

 

#analytics #announcements #api for mongodb #core (sql) api #data architecture #query #azure cosmos db #azure synapse analytics #serverless sql pools #sql on-demand #synapse link #synapse sql serverless

Ruthie  Bugala

Ruthie Bugala

1620431700

Analyze Azure Cosmos DB data using Azure Synapse Analytics

This article will help you understand how to analyze Azure Cosmos DB data using Azure Synapse Analytics.

Introduction

Azure Cosmos DB is a multi-model NoSQL database that supports hosting various types of data that are transactional in nature. OLTP systems employ transactional databases for hosting operational data. To analyze large volumes of transactional data, relational databases do not scale or perform to the needs of large-scale analytics. Columnar data warehouses are one of the preferred, effective, and proven means of analyzing and aggregating large volumes of data for big data scale analytics. Azure Synapse is the data warehouse offering in the Microsoft Azure technology stack. The challenge with analyzing transactional data in relational databases using columnar warehouses is that one needs to replicate and/or relocate data from operational repositories into analytical repositories. Hybrid transactional analytical processing (HTAP) is a methodology or approach where data hosted in a relational format is auto-organized in a columnar format eliminating the need to replicate and/or relocate the data to a great extent. Azure offers a feature to analyze data hosted in Cosmos DB using Azure Synapse. In this article, we will learn how to implement the same.

Pre-requisites

We are assuming that we are hosting data in the Cosmos DB instance. To simulate this assumption, we would need an Azure Cosmos DB account implemented using the Core (SQL) API, with all the preview features turned on. Once you have an account created, you would be able to see an account listed as shown below.

#azure #sql azure #azure synapse analytics #azure

Aisu  Joesph

Aisu Joesph

1623721730

Integrating Azure Purview with Azure Synapse Analytics

In this article, we will learn how to integrate Azure Purview and Azure Synapse Analytics capabilities to access data catalog assets hosted in Purview from Azure Synapse.

Introduction

Data exists in various formats on various types of repositories on different clouds as well as on-premises. With the growing data landscape, two of the most common capabilities required to manage as well as extract value out of data are data cataloging and data warehousing. Data cataloging or metadata cataloging enables to keep track of the metadata evolution as well acts as a guiding beacon for all data pipelines that move data from source to destination. Data warehousing provides an approach and capabilities to process large volumes of data efficiently when data across the enterprise is collated for deriving insights. The gap between these two capabilities is that if these two capabilities are not integrated, the teams managing these two capabilities would not have any view of each other’s landscape. Typically, the data warehousing capability acts as one of the biggest consumers of data catalogs like many other data capabilities. Azure provides Purview for data cataloging and governance and Azure Synapse Analytics for data warehousing. In this article, we will see how to integrate these two capabilities to access data catalog assets hosted in Azure Purview from Azure Synapse.

Pre-requisite

As we are going to work with Azure Purview as well as Azure Synapse, we need a few things in place before we can start configuring these tools to integrate with each other. It is assumed that one has the required privileges to administer and operate Purview and Azure Synapse services on their Azure account.

First, we need an instance of Purview, which would provide access to the Purview Studio tool. Using this tool, some data repositories should be cataloged so that when we search for data assets cataloged in this tool, we would find some results. A good example would be creating an Azure SQL Database with the sample data that comes built-in and catalog it with Purview. It is assumed that this Azure Purview setup is already in place and data assets are already cataloged.

Next, we need an instance of Azure Synapse Workspace created, which would provide access to the Synapse Studio tool. This is the primary administrative console that facilitates operating the Synapse pool. Once this setup is in place, it would look as shown below and with this, we are ready to start our exercise of integrating Azure Synapse with Azure Purview.

Configuring Azure Purview for integrating with Azure Synapse Analytics

Open the Azure Synapse Studio by clicking on the Open Synapse Studio link from the dashboard page of Azure Synapse Workspace. Click on the Manage blade and you will see Azure Purview (Preview) under the External connections section as shown below. This feature is still in Preview as of the draft of this article. This feature allows us to integrate Synapse with Purview.

As shown above, we need to start by connecting our Azure Purview account here. Click on the button named Connect to a Purview account. It would pop-up a screen as shown below. If you have the Azure Purview account under the same Azure subscription in which the Azure Synapse Analytics account is created, when you select the “From Azure Subscription”, you will find the Purview account name as shown below.

Select the purview account and click on the Apply button. This will register the account with Azure Synapse as well as integrate it with Purview. Once done, you will receive a successful registration confirmation as shown below.

The benefit of connecting Azure Synapse with Azure Purview is that we can access the data assets from the catalog right in Azure Synapse Studio, and also use this information to initiate different actions supported by Synapse. To start accessing the Purview catalog from Synapse Studio, navigate to the Data tab and click on the search bar at the top of the screen as shown below. There would be a drop-down in the search bar which would have two options – Workspace and Purview. Ensure to select Purview as shown below. Now we are ready to start searching the catalog for data assets.

Type a full or partial name of the database object that we intend to search as shown below, and it would show a list of database objects that match the search criteria. These search results should not be confused with the database objects hosted in the Synapse pools which are part of the Synapse Workspace. As we are searching in the Purview catalog, the result would consist of data assets held in the specific purview account instance only. If we want to search for items within the workspace, we need to select the Workspace option in the drop-down which would list search results of objects in Azure Synapse.

The results are divided into two panes – the filters pane and the results pane. The filters pane shows the data asset type, classification and other such filters related to cataloged data assets. The results that meet the filter criteria as shown on the right pane. The results show the name of the data objects as well as the type of repository that holds the data object and address of the same.

Let’s say that we intend to explore the details of a particular data asset to understand whether it is suitable to be used as a source of data for data warehousing. We can click on the item in the results pane and it would show the results as shown below. In this case, it’s an Azure SQL Database table, so the details like Schema, Lineage, Data Classification, Related database objects, etc. are shown. On the right side of this screen, we can find the hierarchy under which this database object belongs.

Another interesting and useful feature of these results can be found in the related tab. At times, we may be searching for a database object but that may not be the exact match. Finding objects that are similar or related to the object being search can elevate the possibilities of finding the database object of interest. The related tab shows database objects like database, schemas, tables, or view depending on the hierarchy selected as shown below.

Once the data object of interest has been discovered, the next step is to take corresponding actions like creating a linked service, integration dataset, or a new data flow to source the data from the corresponding data repository. The Connect and Develop menu item provide links to initiate such actions as shown below. Clicking on these links would open a new pop-up window or wizard which would have the details of the data source and the data object already pre-populated. We can provide the credentials, build the corresponding artifact in Azure Synapse, and start sourcing the data from the targeted object.

The benefit of this integration is that we do not need to switch between two sets of services, gain access to the catalog which may be maintained by a data steward or data quality team, and port details back and forth from Azure Purview to Azure Synapse. The built-in integration eliminates all this overhead and provides the convenience of a catalog right within the operational console of a data warehousing environment.

#azure #sql azure #azure purview #azure synapse analytics

Ruthie  Bugala

Ruthie Bugala

1619601744

Azure Synapse Analytics Database CI/CD using Azure Function

In this article, I will discuss an Azure Database CI/CD approach using Azure Premium Function and Jenkins pipeline. I will only explain the architecture and the approach I took to implement the Database CI/CD pipeline.

Problem Statement and Challenges

I was working on a project where I had to build a Database deployment pipeline using enterprise GitHub which is only accessible through the company’s internal network. Also, port 1433 was blocked from the internal network to the Azure Synapse public endpoint for security reasons. Hence the only option I had was to run my pipeline in an internal network so that I could access GitHub which I was using for my Database Deployment Source Control and send the SQL code to Azure Synapse using Azure function HTTP post as port 1433 was blocked.

#azure #devops #azure-synapse-analytics #azure-devops #azure-functions