In this tutorial, you will learn to process file sets with U-SQL in Azure Data Lake Analytics.
This article will help you process file sets with U-SQL in Azure Data Lake Analytics.
In the previous articles, we learned how to create database objects in Azure Data Lake Analytics using U-SQL. We also looked at some of the frequently used DDL and DML commands. So far, we have learned different aspects of Azure Data Lake Analytics and U-SQL using a single file stored in Azure Data Lake Storage as the source data. In real-world scenarios, often there are tens to thousands of files that are stored on the data lake storage, and the same must be processed in parallel based on specific criteria. Considering Azure Data Lake Storage forms the storage layer of a data lake in Azure, massive volumes of data hosted in different types of files are expected to be stored and processed. We will understand how to process this data using U-SQL in this article.
In the previous parts of this Azure Data Lake Analytics series, we installed Visual Studio Data Tools and set up a sample U-SQL application that has a few sample scripts as well as sample data. To proceed with the rest of the steps explained in this article, one would need this setup on their local machine. It’s assumed that this setup is in place.
Navigate to the solution explorer window and expand the TestData as shown below. You would be able to see the sample data files. The files starting with the term “vehicle” are the series of files that resemble the types of files that typically exist in production environments. These files are classified by numbers and dates. We already installed sample data on the Azure Data Lake Storage account in the previous parts of this series, so the data visible below exists in the same folder hierarchy on our Azure Data Lake account as well.
If you open any one of the files, you would be able to see the schema and data as shown below. These files are in CSV format, and contains a few fields and a reasonable number of records as well.
If you open the same file using Azure Data Lake Explorer, you would be able to preview the data in a more formatted manner. Navigate back to the solution explorer window and open the script titled Ambulance-3-1-FileSets as shown below. This script contains the U-SQL code that would process the data in these vehicle-related files.
Let’s attempt to understand this U-SQL script step-by-step.
In the next step, we are filtering and selecting the extracted data based on a filter criterion. We have specified predicate to select where vid is equal to filter files based on a vid and date range. If you analyze the names of files, you would find four files that have the value of “vehicle1” as the name of the file. So only those files will be selected. And then the date range also will be applied.
SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.
In this article, you learn how to set up Azure Data Sync services. In addition, you will also learn how to create and set up a data sync group between Azure SQL database and on-premises SQL Server.
In this video, see how to configure your Azure SQL connectivity leveraging the Azure CLI and PowerShell notebooks in Azure Data Studio. For the full Azure SQ...
In this tutorial, we will show how to source data from Azure SQL Database to use in a Machine Learning workflow.
Azure SQL is composed of Azure SQL Database, Azure SQL Managed Instance, and SQL Server in Azure VM. Learn about the key differentiators between them. For th...