JSON in your Azure SQL Database? Let’s benchmark some options!

Introduction

Storing and retrieving data from JSON fragments is a common need in many application scenarios, like IoT solutions or microservice-based architectures. You can persist these fragments can be in a variety of data stores, from blob or file shares, to relational and non-relational databases, and there’s a long standing debate in the industry on what’s the database technology that fits “better” for this task.

Azure SQL Database offers several options for parsing, transforming and querying JSON data, and this article doesn’t pretend to provide a definitive answer to that debate, but rather to explore these options for common scenarios like data loading and retrieving, and benchmarking results to provide a clear indication of how Azure SQL Database will perform manipulating JSON data.

Our test bench

To reproduce a realistic scenario, we started from a JSON fragment representing a real telemetry message from a machine in a factory floor containing ~60 attributes, for an average 1.6KB size:

We then identified two options for table structures where persist these JSON messages and execute various tests against.

First one is storing plain JSON fragments in a nvarchar(max) column, which is an approach that some customers that don’t want to deal with a relational model tend to adopt:

The second approach we wanted to test is instead “shredding” all attributes of the JSON document into specific columns in a relational table:

Loading data

First scenario we want to test is how efficiently we can load one or many documents generated by an application into these two table structures. In our tests, in order to eliminate all variability related to application implementation and potential latency in database interactions, we dynamically generate various row batches to insert (1, 1000, 10000, 100000, 3M rows) on the server-side, with some random attribute values generated while inserting into our target table through an INSERT…SELECT statement like this:

For the schematized table we’re taking a similar approach to generate our synthetic rowset, but we’re using the OPENJSON function instead to automatically shred all JSON attributes into table columns while again inserting all rows with an INSERT…SELECT statement:

For each table, we then measure loading these row batches with different indexing strategies:

  1. Heap (no clustered index)
  2. Clustered index
  3. Clustered Columnstore (trickle, or individual inserts we wrapped in a BEGIN/COMMIT )
  4. Clustered Columnstore (bulk inserts)

For #3, it’s important to mention the importance of batching multiple individual insert operations and wrap them into an explicit transaction to mitigate both latency between application and database layer, and latency introduced by individual transaction log writes typical of explicit transactions.

This topic has been extensively covered in another article comparing and contrasting all batching options. Having multiple writers can potentially create some concurrency issues that varies depending on the combination of indexing and rows already present in the table: this article is describing in details impact on concurrency and performance for various options.

For Columnstore indexes, for example, each thread loads data exclusively into each rowset by taking a X lock on the rowset, allowing parallel data load with concurrent data load sessions. It’s important to mention though, that in typical IoT scenarios like the one we mentioned, events and messages are typically sent to an event store like Azure IoT Hub or Kafka, and then processed by a relatively small number of event processors that will execute data loading, so concurrency can be less of an issue.

We then repeated our data loading tests with different Azure SQL Database instance sizes, namely Business Critical 2 vCores, 16 vCores or 40 vCores, to verify if increasing “Max log rate” limit for the instance is impacting data loading performance.

Loading times for all options are represented in the following matrix:

Cells with green background represents best loading times for different batch sizes in each scale tier. As we can see, bulk loading plain JSON into a Clustered Columnstore index is around 3x faster than the schematized approach we tested, across pretty much all batch sizes.

Comparing various indexing strategies on the two data models instead, differences in bulk loading on an heap, clustered index on row store or clustered columnstore is much more limited, in the range of 20-30% difference.

It’s also interesting to see the level of compression that can be achieved for both plain JSON and for the schematized approach by leveraging Clustered Columnstore (from ~5GB down to 204MB and 122MB respectively). In IoT scenarios where you have to store billions of messages per month, this can be a significant cost saving compared to many other database technologies.

Scaling from 2 vCores to 40 vCores is only providing ~20% performance improvement as, for this particular test, we’re executing bulk loading on a single connection and we’re not maxing out resource consumption in any particular area (CPU, IO or memory). If we would execute multiple bulk loading operation in parallel across many connections, we would clearly see a different picture here.

#azure sql #developers #json #azure #sql

What is GEEK

Buddha Community

JSON in your Azure SQL Database? Let’s benchmark some options!
Cayla  Erdman

Cayla Erdman

1594369800

Introduction to Structured Query Language SQL pdf

SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.

Models for SQL exist. In any case, the SQL that can be utilized on every last one of the major RDBMS today is in various flavors. This is because of two reasons:

1. The SQL order standard is genuinely intricate, and it isn’t handy to actualize the whole standard.

2. Every database seller needs an approach to separate its item from others.

Right now, contrasts are noted where fitting.

#programming books #beginning sql pdf #commands sql #download free sql full book pdf #introduction to sql pdf #introduction to sql ppt #introduction to sql #practical sql pdf #sql commands pdf with examples free download #sql commands #sql free bool download #sql guide #sql language #sql pdf #sql ppt #sql programming language #sql tutorial for beginners #sql tutorial pdf #sql #structured query language pdf #structured query language ppt #structured query language

Ruthie  Bugala

Ruthie Bugala

1620435660

How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server

In this article, you learn how to set up Azure Data Sync services. In addition, you will also learn how to create and set up a data sync group between Azure SQL database and on-premises SQL Server.

In this article, you will see:

  • Overview of Azure SQL Data Sync feature
  • Discuss key components
  • Comparison between Azure SQL Data sync with the other Azure Data option
  • Setup Azure SQL Data Sync
  • More…

Azure Data Sync

Azure Data Sync —a synchronization service set up on an Azure SQL Database. This service synchronizes the data across multiple SQL databases. You can set up bi-directional data synchronization where data ingest and egest process happens between the SQL databases—It can be between Azure SQL database and on-premises and/or within the cloud Azure SQL database. At this moment, the only limitation is that it will not support Azure SQL Managed Instance.

#azure #sql azure #azure sql #azure data sync #azure sql #sql server

JSON in your Azure SQL Database? Let’s benchmark some options!

Introduction

Storing and retrieving data from JSON fragments is a common need in many application scenarios, like IoT solutions or microservice-based architectures. You can persist these fragments can be in a variety of data stores, from blob or file shares, to relational and non-relational databases, and there’s a long standing debate in the industry on what’s the database technology that fits “better” for this task.

Azure SQL Database offers several options for parsing, transforming and querying JSON data, and this article doesn’t pretend to provide a definitive answer to that debate, but rather to explore these options for common scenarios like data loading and retrieving, and benchmarking results to provide a clear indication of how Azure SQL Database will perform manipulating JSON data.

Our test bench

To reproduce a realistic scenario, we started from a JSON fragment representing a real telemetry message from a machine in a factory floor containing ~60 attributes, for an average 1.6KB size:

We then identified two options for table structures where persist these JSON messages and execute various tests against.

First one is storing plain JSON fragments in a nvarchar(max) column, which is an approach that some customers that don’t want to deal with a relational model tend to adopt:

The second approach we wanted to test is instead “shredding” all attributes of the JSON document into specific columns in a relational table:

Loading data

First scenario we want to test is how efficiently we can load one or many documents generated by an application into these two table structures. In our tests, in order to eliminate all variability related to application implementation and potential latency in database interactions, we dynamically generate various row batches to insert (1, 1000, 10000, 100000, 3M rows) on the server-side, with some random attribute values generated while inserting into our target table through an INSERT…SELECT statement like this:

For the schematized table we’re taking a similar approach to generate our synthetic rowset, but we’re using the OPENJSON function instead to automatically shred all JSON attributes into table columns while again inserting all rows with an INSERT…SELECT statement:

For each table, we then measure loading these row batches with different indexing strategies:

  1. Heap (no clustered index)
  2. Clustered index
  3. Clustered Columnstore (trickle, or individual inserts we wrapped in a BEGIN/COMMIT )
  4. Clustered Columnstore (bulk inserts)

For #3, it’s important to mention the importance of batching multiple individual insert operations and wrap them into an explicit transaction to mitigate both latency between application and database layer, and latency introduced by individual transaction log writes typical of explicit transactions.

This topic has been extensively covered in another article comparing and contrasting all batching options. Having multiple writers can potentially create some concurrency issues that varies depending on the combination of indexing and rows already present in the table: this article is describing in details impact on concurrency and performance for various options.

For Columnstore indexes, for example, each thread loads data exclusively into each rowset by taking a X lock on the rowset, allowing parallel data load with concurrent data load sessions. It’s important to mention though, that in typical IoT scenarios like the one we mentioned, events and messages are typically sent to an event store like Azure IoT Hub or Kafka, and then processed by a relatively small number of event processors that will execute data loading, so concurrency can be less of an issue.

We then repeated our data loading tests with different Azure SQL Database instance sizes, namely Business Critical 2 vCores, 16 vCores or 40 vCores, to verify if increasing “Max log rate” limit for the instance is impacting data loading performance.

Loading times for all options are represented in the following matrix:

Cells with green background represents best loading times for different batch sizes in each scale tier. As we can see, bulk loading plain JSON into a Clustered Columnstore index is around 3x faster than the schematized approach we tested, across pretty much all batch sizes.

Comparing various indexing strategies on the two data models instead, differences in bulk loading on an heap, clustered index on row store or clustered columnstore is much more limited, in the range of 20-30% difference.

It’s also interesting to see the level of compression that can be achieved for both plain JSON and for the schematized approach by leveraging Clustered Columnstore (from ~5GB down to 204MB and 122MB respectively). In IoT scenarios where you have to store billions of messages per month, this can be a significant cost saving compared to many other database technologies.

Scaling from 2 vCores to 40 vCores is only providing ~20% performance improvement as, for this particular test, we’re executing bulk loading on a single connection and we’re not maxing out resource consumption in any particular area (CPU, IO or memory). If we would execute multiple bulk loading operation in parallel across many connections, we would clearly see a different picture here.

#azure sql #developers #json #azure #sql

Creating and Cataloging SQL pools in Azure SQL Server

This article will walk you through creating a new SQL pool within an existing Azure SQL Server as well as catalog the same using the Azure Purview service.

Introduction

Data is generated by transactional systems and typically stored in relational data repositories. This data is generally used by live applications and for operational reporting. As this data volume grows, this data is often required by other analytical repositories and data warehouses where it can be used for referential purposes and adding more context to other data from across the organization. Transactional systems (also known as Online Transaction Processing (OLTP) systems) usually need a relational database engine, while analytical systems (also known as Online Analytical Processing (OLAP) systems) usually need analytical data processing engines. On Azure cloud, it is usually known that for OLTP requirements, SQL Server or Azure SQL Database can be employed, and for analytical data processing needs, Azure Synapse and other similar services can be employed. SQL Pools in Azure Synapse host the data on an SQL Server environment that can process the data in a massively parallel processing model, and the address of this environment is generally the name of the Azure Synapse workspace environment. At times, when one has already an Azure SQL Server in production or in use, the need is to have these SQL Pools on an existing Azure SQL Server instance, so data in these SQL pools can be processed per the requirements on an OLAP system as well as the data can be co-located with data generated by OLTP systems. This can be done by creating SQL Pools within the Azure SQL Server instance itself. In this article, we will learn to create a new SQL Pool within an existing Azure SQL Server followed by cataloging the same using the Azure Purview service.

Pre-requisite

As we intend to create a new SQL Pool in an existing Azure SQL Server instance, we need to have an instance of Azure SQL in place. Navigate to Azure Portal, search for Azure SQL and create a new instance of it. We can create an instance with the most basic configuration for demonstration purposes. Once the instance is created, we can navigate to the dashboard page of the instance and it would look as shown below.

As we are going to catalog the data in the dedicated SQL Pool hosted on Azure SQL instance, we also need to create an instance of Azure Purview. We would be using the Azure Purview studio from the dashboard of this instance, tonregister this SQL Pool as the source and catalog the instance.

#azure #sql azure #azure sql server #sql #sql #azure

Ruth  Nabimanya

Ruth Nabimanya

1621850444

List of Available Database for Current User In SQL Server

Introduction

When working in the SQL Server, we may have to check some other databases other than the current one which we are working. In that scenario we may not be sure that does we have access to those Databases?. In this article we discuss the list of databases that are available for the current logged user in SQL Server

Get the list of database
Conclusion

#sql server #available databases for current user #check database has access #list of available database #sql #sql query #sql server database #sql tips #sql tips and tricks #tips