In this article, you’ll learn the approach to integrate MongoDB data source using data virtualization technique in SQL Server 2019. In this article, you can see how SQL Server 2019 provides a platform to create a modern enterprise data hub using data virtualization technology and the PolyBase technique.

  1. Discuss Data Virtualization
  2. Pre-requisite to setup MongoDB
  3. Set up MongoDB external connection in SQL Server 2019
  4. Many more…

Introduction

The advent of Data virtualization in SQL Server 2019 allows us to solve modern and complex data challenges. Data virtualization with PolyBase in SQL Server 2019 is used as a data hub, and you can directly query the data from several heterogeneous data sources. These data sources include Azure Managed Instance, Oracle, Teradata, SAP HANA, MongoDB, Hadoop clusters, Cosmos DB, and SQL Server. We can query the data source using T-SQL and without separately installing driver software.

The data virtualization in SQL Server 2019 is an improvised solution to the ETL process. The other advantage of Data virtualization is that it allows the integration of data from different sources such as Azure MI, SQL Server, MongoDB, Oracle, DB2, Cosmos DB, and Hadoop-Distributed-File-System (HDFS) without the much data movement around the source and destination. This process is possible with the advent of PolyBase connectors.

  • Note:
  • Using T-SQL, we can query heterogeneous data sources using PolyBase connectors. This provides the bridge to query the data from external data sources such as SQL Server, Oracle, Teradata, MongoDB, and ODBC data sources with external tables
  • It also supports the UTF-8 encoding format

Get started:

In this section, you will learn how to create secure data access from the underlying data source.

In this case, PolyBase uses the security model of the MongoDB model to access the data. In most cases, we need permission to read the data. However, the credentials used to read the data and it is stored inside the PolyBase data hub.

To set-up data virtualization, follow the below steps:

  1. Setup database master key
  2. Create database-scoped credentials
  3. PolyBase external tables
  4. Configure External data sources

To configure database virtualization, select the database. Right-click the database and select Create External Table that starts the data virtualization wizard.

  1. Select a data source
  2. Create a database master key

In this section, we will see how to create a database master key. The master key is created inside the SQL Server database and it acts as a data hub.

The master key is providing a secure way to read data using the credentials in the external data source. It is always recommended to choose a complex password for the master key. In addition, use the BACKUP MASTER KEY command to back up the master key.

#polybase #sql server 2019 #data-science

Data Virtualization with MongoDB using PolyBase in SQL Server 2019
2.15 GEEK