Copy data into Azure Synapse Analytics using the COPY command

Copy data into Azure Synapse Analytics using the COPY command

In this article, we will learn how to copy data from an external data source using the COPY command into a dedicated SQL pool of Azure Synapse Analytics.

In this article, we will learn how to copy data from an external data source using the COPY command into a dedicated SQL pool of Azure Synapse Analytics.

Introduction

Azure Synapse Analytics is a data warehouse offering on Azure cloud that comes packed with three types of runtimes for processing data using serverless SQL pools, dedicated SQL pools and Spark pools. The dedicated SQL pool provides dedicated storage and processing framework, where one can host and process data in a massively distributed manner using the SQL runtime engine. To process the data on this runtime, data is imported or copied from a variety of data sources on the Azure data ecosystem using means like Azure Databricks, Data Pipelines and other means. Though these are some of the means to create and operate data pipelines, development teams often need the means to import data using the query language itself. Azure Synapse Analytics provides a COPY command which is a standard command offered by many industry leading cloud data warehouses. This command facilitates the copy of data into the local data objects in the dedicated SQL pool of Azure Synapse. In this article, we are going to understand the process to copy data into a dedicated SQL pool of Azure Synapse Analytics using the COPY command.

Pre-requisites

Data exists in file format on containers in the Azure Storage account. These files can be processed and analyzed using various mechanisms like Azure Data Lake Analytics. One of the latest means of analyzing this data is by using Azure Synapse Analytics. We need to copy data from the files hosted on this storage account to a table in the dedicated SQL pool of Synapse. We would need some setup in place, before we can start this exercise.

Firstly, we need a Gen 2 storage account in Azure, where we may host a sample data file in a container. For this create an account if you don’t have one already and host a file in a container and keep the path of the file handy. Then we need an Azure Synapse Analytics workspace with a dedicated SQL pool created in it. This is the repository where we would be copying the files from our storage account. It is assumed that this setup is in place before proceeding with the rest of the exercise.

COPY Command

Let’s look at the syntax of the COPY command which would provide a great deal of insight into the capabilities of this command. The syntax of the COPY command is as shown below:

We can make few observations just from the syntax of the COPY command itself as shown below:

  • This command natively supports CSV, Parquet and ORC format files with Gzip or Snappy compression. We can register external file formats as well
  • We can provide credentials like Shared Access Signature or the Storage Account Key to read data from the storage account
  • We can store the error output in a separate file which would be directly saved on the storage account using the provided credentials for error logging
  • We can specify to terminate copying of data is the maximum errors go beyond a defined threshold
  • We get several options to specify the structure of the file like row terminator, field terminator, date format, encoding etc

Now that we understand the details of this command, we are ready to copy the data. As evident from the syntax of the command, it expects a database object in which it will copy the data. So first we need to create a table in the dedicated SQL pool where we would copy data. Open SSMS and connect to the Azure Synapse Analytics endpoint and create a new table that has a schema like a source file. Here we are creating a table called Trip which we would populate from the publicly available NYCTaxi dataset file.

azure sql azure

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Introduction to Structured Query Language SQL pdf

SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.

How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server

In this article, you learn how to set up Azure Data Sync services. In addition, you will also learn how to create and set up a data sync group between Azure SQL database and on-premises SQL Server.

Demo: Configure Azure SQL with Azure CLI | Azure SQL for beginners (Ep. 17)

In this video, see how to configure your Azure SQL connectivity leveraging the Azure CLI and PowerShell notebooks in Azure Data Studio. For the full Azure SQ...

Sourcing data from Azure SQL Database in Azure Machine Learning

In this tutorial, we will show how to source data from Azure SQL Database to use in a Machine Learning workflow.

What is Azure SQL? | Azure SQL for beginners (Ep. 3)

Azure SQL is composed of Azure SQL Database, Azure SQL Managed Instance, and SQL Server in Azure VM. Learn about the key differentiators between them. For th...