This article will help you create database objects in Azure Data Lake Analytics using U-SQL.

Introduction

In the fourth part of this article series, Deploying U-SQL jobs to the Azure Data Analytics account, we learned how to deploy U-SQL jobs on Data Lake Analytics. So far, we learned the basics of how to query semi-structured or unstructured data using U-SQL as well as develop U-SQL jobs locally using Visual Studio. While processing large volumes of data stored in files is one way to process big data, there are use-cases where one may need to have structured views over semi-structured or structured data. This can be compared to query engines like Hive which provides SQL-like interface over unstructured data. Using constructs like Database, Tables, Views, etc. Azure Data Lake Analytics provides a mechanism to analyze files-based data hosted on Azure Data Lake Storage account using U-SQL as the Data Definition Language as well as the Data Manipulation Language. In this article, we will learn how to use U-SQL to analyze unstructured or semi-structured data.

U-SQL Data Definition Language

An easier way to understand structured constructs in Azure Data Lake Analytics is by comparing it to SQL Server database objects. U-SQL DDL supports database objects like schemas, tables, indexes, statistics, views, functions, packages, procedures, assemblies, credentials, functions, types, partitions and data sources. By default, Azure Data Lake Analytics comes with a master database. In the previous part of this Azure Data Lake Analytics series, we created a data lake analytics account, set up a visual studio and created a sample U-SQL application on sample data. It is assumed that this setup is already in place.

Let’s say that we intend to analyze file-based data hosted in the data lake storage account. The sample application that we setup comes with a sample file called SearchLog.tsv. This file can be opened from the Sample Data directory, and it would look as shown below in the File Explorer.

Open the script file titled CreatingTable.usql and you would find the script that creates database objects using U-SQL as shown below:

  • Drop Database statement drops any existing database
  • Create Database statement creates a new database
  • Use the Database statement switches the context to the specified database
  • U-SQL databases have built-in and default schema named dbo. Optionally one can create additional schemas as well
  • Create Table statement creates a new table, which provides a database management system-level optimization in comparison to file-based data analytics. Behind the scenes, data in U-SQL Tables are stored in the form of files. There are two types of tables in U-SQL: Managed tables and External tables, which host data natively or externally in an external data repository respectively. The below example shows the DDL to create a managed table
  • The INDEX keyword in the table definition is specified to create a new index
  • CLUSTERED keyword specified the types of index and the fields on which the index should be created
  • DISTRIBUTED BY specified the keys or values on which the data in the table should be distributed

#azure #sql azure #u-sql #azure data lake analytics

Creating database objects in Azure Data Lake Analytics using U-SQL
1.25 GEEK