Why does my login to MS SQL with AzureML dataprep using Windows authentication fail?

I tried connecting to a MS SQL database using&nbsp;<code><a href="https://docs.microsoft.com/en-us/python/api/azureml-dataprep/azureml.dataprep?view=azure-dataprep-py" target="_blank">azureml.dataprep</a></code>&nbsp;in an Azure Notebook, as outlined in&nbsp;<a href="https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-load-data#load-sql-data" target="_blank">https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-load-data#load-sql-data</a>, using&nbsp;<a href="https://docs.microsoft.com/en-us/python/api/azureml-dataprep/azureml.dataprep.mssqldatasource?view=azure-dataprep-py" target="_blank">MSSqlDataSource</a>, using code of the form

I tried connecting to a MS SQL database using azureml.dataprep in an Azure Notebook, as outlined in https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-load-data#load-sql-data, using MSSqlDataSource, using code of the form

import azureml.dataprep as dprep

secret = dprep.register_secret(value="[SECRET-PASSWORD]", id="[SECRET-ID]")

ds = dprep.MSSQLDataSource(server_name="[SERVER-NAME]",
database_name="[DATABASE-NAME], [PORT]",
user_name="[DATABASE-USERNAME]",
password=secret)

Setting [DATABASE-USERNAME] equal to MYWINDOWSDOMAIN\MYWINDOWSUSERNAME and the password [SECRET-PASSWORD] coinciding with my Windows password (i.e. trying to use Windows authentication).

After firing a query with

dataflow = dprep.read_sql(ds, "SELECT top 100 * FROM [dbo].[MYTABLE]")
dataflow.head(5)

I get

ExecutionError: Login failed.

I could connect to other databases without Windows Authentication fine. What am I doing wrong?

Storage Considerations for Running SQL Server in Azure

Storage Considerations for Running SQL Server in Azure

<strong>Originally published by </strong>&nbsp;<a href="https://dzone.com/users/3368561/daveberm.html" target="_blank">David Bermingham</a><strong> </strong><em>at&nbsp;</em><a href="https://dzone.com/articles/storage-considerations-for-running-sql-server-in-a" target="_blank">dzone.com</a>

Originally published by  David Bermingham at dzone.com

Explore some storage considerations for running SQL Server in Azure.

If you are deploying SQL Server in Azure or any Cloud platform for that matter, instead of just provisioning storage like you did for your on-premises deployments for many years, you may consider that storage in Azure isn't exactly like the storage you may have had access to on-premises. Some traditional "best practices" may wind up costing you additional money and give you less than optimal performance, all while not providing you any of the intended benefits. Much of what I am about to discuss is also described in Performance Guidelines for Azure in SQL Server Virtual Machines.

I'm not here to tell you that you must use UltraSSD, Premium Storage, or any other disk type. You just need to be aware that you have options and know what each disk type brings to the table. Of course, like anything else in the cloud, the more money you spend, the more power, speed, throughput, etc. you will achieve. The trick is finding the optimal configuration so that you spend just enough to achieve the desired results.

Size DOES Matter

Like many things in the cloud, certain specs are tied together. For servers, if you want more RAM, you often get more CPU even if you didn't NEED it. For storage, IOPS, throughput, and size are all tied together. If you want more IOPS, you need a bigger disk. If you need more space, you also get more IOPS. Of course, you can jump between storage classes to circumvent that to some extent, but it still holds true that if you need more IOPS. You also get more space on any of the different storage types.

The size of your virtual machine instance also matters. Regardless of what storage configuration you eventually go with, the overall throughput will be capped at whatever the instance size allows. So once again, you may need to pay for more RAM and CPU than you need just to achieve your desired storage performance. Make sure you understand what your instance size can support in terms of max IOPS and MBps throughput. Many times the instance size will turn out to be the bottleneck in a perceived storage performance problem in Azure.

RAID 0 is traditionally the 3rd rail of storage configuration options. Although it provides the best combination of performance and storage utilization of any RAID option, it does so at the risk of a catastrophic failure. If just a single disk in a RAID 0 stripe set should fail, the entire stripe set fails. So traditionally RAID 0 is only used in scenarios where data loss is acceptable and high performance is desirable.

However, in Azure software, RAID 0 is desirable and even recommended in many situations. How can we get away with RAID 0 in Azure? The answer is easy. Each disk you present to an Azure virtual machine instance already has triple redundancy on the backend, meaning you would need to have multiple failures before you would lose your stripe set. By using RAID 0, you can combine multiple disks and the overall performance of the combined stripe set will increase by 100 percent for each additional disk you add to the stripe set.

So, for example, if you had a requirement of 10,000 IOPS, you might think that you need UltraSSD since Premium Storage maxes out at 7,500 IOPS with a P50. However, if you put two P50s in a RAID 0, you now have the potential to achieve up to 15,000 IOPS, assuming you are running a Standard_F16s_v2 or similarly large instance size that supports that many IOPS.

In Windows 2012 and later, RAID 0 is achieved by creating a Simple Storage Space. In Windows Server 2008 R2 you can use Dynamic Disks to create a RAID 0 Striped Volume. Just a word of caution, if you are going to use a local Storage Space and also configure Availability Groups or a SANless Failover Cluster Instance with DataKeeper, it is best to configure your storage BEFORE you create a cluster.

Just a reminder, you only have about two more months to move your SQL Server 2008 R2 instances to Azure. Check out my post on how to deploy a SQL Server 2008 R2 FCI on Azure to ensure high availability.

Don't Bother Separating Log and Data Files

Traditionally, log and data files would reside on different physical disks. Log files tend to have a lot of write activity and data files tend to have more read activity, so sometimes storage would be optimized based on those characteristics. It was also desirable to keep log and data files on different disks for recovery purposes. If you should lose one or the other, with a proper backup strategy in place you could recover your database with no data loss.

With cloud-based storage, the likelihood of losing just a single volume is EXTREMELY low. If, by chance, you lose storage, it is likely your entire storage cluster, along with the triple redundancy, went to lunch. So while it may feel right to put logs in E:\ logs and data in F:\data, you really are doing yourself a disservice. For example, if you provision a P20 for logs and a P20 for data, each volume will be 512 GiB in size and capped at 2,300 IOPS. And just think, you may not need all that size for log files, but it might not give you much room to grow for your data files, which will eventually require moving to a more expensive P30 just for the extra space.

Wouldn't it be much nicer to simply stripe those two volumes together into a nice large 1 TB volume that supports 4,600 IOPS? By doing that, both the log and data files can take advantage of the increased IOPS, and you have also just optimized your storage utilization and decreased your cloud storage cost by putting off the move to a P30 disk for your data file.

The same holds true files and filegroups. Really think hard about what you are doing and whether it still makes sense once you move to the cloud. What makes sense might be counter-intuitive to what you have done in the past. When in doubt, follow the KISS rule, Keep It Simple Stupid! The beauty of the cloud is that you can always add more storage, increase instance size, or do whatever it takes to optimize performance vs. cost.

What to do About TempDB

Use the local SSD, aka, the D: drive. The D drive is going to be the best location for your tempdb. Because it is a local drive the data is considered "temporary," meaning it can be lost if a server is moved, rebooted, etc. That's okay, tempdb is recreated each time SQL starts anyway. The local SSD is going to be fast and have low latency, but because it is local the reads and writes to it do not contribute to the overall storage IOPS limit of the instance size, so effectively, it is FREE IOPS, so why not take advantage? If you are building a SANless SQL Server FCI with SIOS DataKeeper, be sure to create a non-mirrored volume resource of the D drive so you don't needlessly replicate TempDB.

Mount Points Become Obsolete

Mount Points are commonly used in SQL Server FCI configurations when multiple instances of SQL Server are installed on the same Windows Cluster. This reduces the overall cost of SQL Server licenses and can help save costs by driving higher server utilization. As we discussed in the past, typically, there might be five or more drives associated with each SQL Server instance. If each of those drives had to consume a drive letter you would run out of letters in just about three to four instances. So instead of giving each drive a letter, mount points were used so that each instance could just be serviced by a single drive letter, the root drive. The root drive has mount points that map to separate physical disks that don't have drive letters.

However, as we discussed above, the concept of using a bunch of individual disks really doesn't make a lot of sense in the cloud, hence mount points become obsolete in the cloud. Instead, create a RAID 0 stripe, as described, and each clustered instance SQL Server will simply have its own individual volume that is optimized for space, performance, and cost. This solves the problem of running out of drive letters and gives you much better storage utilization and performance while also reducing the cost of your cloud storage.

Conclusions

This post is meant as a jumping off point, not a definitive guide. The main point of the post is to get you thinking differently about cloud and storage as it pertains to running SQL Server. Don't simply take what you did on-premise and recreate it in the cloud, that will almost always result in less than optimal performance and a much larger storage bill than necessary.

---------------------------------------------------------------------------------------------------------------------------------------------


Learn More

☞ The Complete Python & PostgreSQL Developer Course

☞ SQL & PostgreSQL for Beginners

☞ An Introduction to Queries in PostgreSQL

☞ Spring Boot with PostgreSQL, Flyway, and JSONB

☞ Node.js - Express Persistent Session Store with PostgreSQL + Sequelize

☞ Secure Node.js, Express.js and PostgreSQL API using Passport.js

☞ Build a Basic App with Spring Boot and JPA using PostgreSQL

Connect to Microsoft SQL Server database on MacOS using Python

Connect to Microsoft SQL Server database on MacOS using Python

Connect to your MS SQL using python. The first thing you need is to install Homebrew. You need the copy the content in the square brackets which in my case is “ODBC Driver 13 for SQL Server”. Replace “ODBC Driver 13 for SQL Server” with the content you copied in the square brackets.

There are always situations where I want to automate small tasks. I like using Python for these kind of things, you can quickly get something working without much hassle. I needed to perform some database changes in a Microsoft SQL Server database and wanted to connect to it using Python. On Windows this is usually pretty straight forward. But I use macOS as my main operating system and I had some hurdles along the way so here is a quick how to.

Preparing

If Homebrew isn't installed yet let's do that first. It's an excellent package manager for macOS. Paste the following command in the terminal to install it:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Once finished run the following commands to install the Microsoft ODBC Driver 13.1 for SQL Server. This driver will be used to connect to the MS SQL database.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
brew install [email protected] [email protected]
Python package: pyodbc or pypyodbc

With the driver installed, we'll be using a Python package that uses the driver to connect to the database. There are two frequently used packages to connect in Python: pyodbc and pypyodbcpypyodbc is a pure Python re-implementation of pyodbc. The main thing I took a way was that pypyodbcis easier to set up on different platforms. I could not get pyodbc working, it simply wouldn't connect to the database.

Installing pypyodbc can be done using pip, the python package installer.

pip install pypyodbc
Code

Now that the necessary components have been installed, let's open our favorite code editor and write code. The first thing that needs to be done is importing pypyodbc. The script starts with this:

import pypyodbc

Then we need a connection to the database:

sqlConnection = pypyodbc.connect(
                "Driver={ODBC Driver 13 for SQL Server};"
        "Server=<server IP address>;"
        "Database=<database>;"
        "uid=<username>;pwd=<password>");

Note that you have to replace four values in this code: server IP addressdatabase , username and password. The value for the driver is a hard coded value which indicates what driver to use to connect to the database, this value points to the driver that was installed earlier.

Now all what rests is use the connection and run a query.

cursor = sqlConnection.cursor()
cursor.execute("select * from Values")

The cursor now contains the result of our query, the property cursor.rowcount returns the number of rows the query returned. It's now possible to loop through the rows and access the different columns:

for row in cursor:
    print(cursor)
    # first column
    firstColumn = row[0]
    # ...

When we're done, we need to clean up the cursor and database connection by closing it.

cursor.close()
sqlConnection.close()

And that's it, save the file and use the python <filename>.py or python3 <filename.py> command, this depends on your configuration, to run. Here is the entire script:

import pypyodbc

sqlConnection = pypyodbc.connect(
"Driver={ODBC Driver 13 for SQL Server};"
"Server=<server IP address>;"
"Database=<database>;"
"uid=<username>;pwd=<password>");

cursor = sqlConnection.cursor()
cursor.execute("select * from Values")

for row in cursor:
print(cursor)
# first column
firstColumn = row[0]
# ...

cursor.close()
sqlConnection.close()

The with syntax can also be used to automatically close the cursor and the connection, this is another way of writing the same script:

import pypyodbc

with pypyodbc.connect(
"Driver={ODBC Driver 13 for SQL Server};"
"Server=<server IP address>;"
"Database=<database>;"
"uid=<username>;pwd=<password>") as sqlConnection:

with sqlConnection.cursor() as cursor:

    cursor.execute("select * from Values")

    for row in cursor:
        print(cursor)
        # first column
        firstColumn = row[0]
        # ...

If you're looking for some more reading on the topic:

Thanks for reading. If you liked this post, share it with all of your programming buddies!

Further reading

☞ Python for Time Series Data Analysis

☞ Python Programming For Beginners From Scratch

☞ Python Network Programming | Network Apps & Hacking Tools

☞ Intro To SQLite Databases for Python Programming

☞ Ethical Hacking With Python, JavaScript and Kali Linux

☞ Beginner’s guide on Python: Learn python from scratch! (New)

☞ Python for Beginners: Complete Python Programming

What are the differences between Standard SQL and Transact-SQL?

What are the differences between Standard SQL and Transact-SQL?

In this article, we'll explain syntax differences between standard SQL and the Transact-SQL language dedicated to interacting with the SQL

#1 Names of Database Objects

In relational database systems, we name tables, views, and columns, but sometimes we need to use the same name as a keyword or use special characters. In standard SQL, you can place this kind of name in quotation marks (""), but in T-SQL, you can also place it in brackets ([]). Look at these examples for the name of a table in T-SQL:

CREATE TABLE dbo.test.“first name” ( Id INT, Name VARCHAR(100));
CREATE TABLE dbo.test.[first name]  ( Id INT, Name VARCHAR(100));

Only the first delimiter (the quotation marks) for the special name is also part of the SQL standard.

What Is Different in a SELECT Statement?#2 Returning Values

The SQL standard does not have a syntax for a query returning values or values coming from expressions without referring to any columns of a table, but MS SQL Server does allow for this type of expression. How? You can use a SELECT statement alone with an expression or with other values not coming from columns of the table. In T-SQL, it looks like the example below:

SELECT 12/6 ;

In this expression, we don’t need a table to evaluate 12 divided by 6, therefore, the FROM statement and the name of the table can be omitted.

#3 Limiting Records in a Result Set

In the SQL standard, you can limit the number of records in the results by using the syntax illustrated below:

SELECT * FROM tab FETCH FIRST 10 ROWS ONLY

T-SQL implements this syntax in a different way. The example below shows the MS SQL Server syntax:

SELECT * FROM tab ORDER BY col1 DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY;

As you notice, this uses an ORDER BY clause. Another way to select rows, but without ORDER BY, is by using the TOP clause in T-SQL:

SELECT TOP 10 * FROM tab;
#4 Automatically Generating Values

The SQL standard enables you to create columns with automatically generated values. The syntax to do this is shown below:

CREATE TABLE tab (id DECIMAL GENERATED ALWAYS AS IDENTITY);

In T-SQL we can also automatically generate values, but in this way:

CREATE TABLE tab (id INTEGER IDENTITY);
#5 Math Functions

Several common mathematical functions are part of the SQL standard. One of these math functions is CEIL(x), which we don’t find in T-SQL. Instead, T-SQL provides the following non-standard functions: SIGN(x), ROUND(x,[,d]) to round decimal value x to the number of decimal positions, TRUNC(x) for truncating to given number of decimal places, LOG(x) to return the natural logarithm for a value x, and RANDOM() to generate random numbers. The highest or lowest number in a list in the SQL standard is returned by MAX(list) and MIN(list) functions, but in Transact-SQL, you use the GREATEST(list) and LEAST(list) functions.

T-SQL function ROUND:

SELECT ROUND(col) FROM tab;

#6 Aggregate Functions

We find another syntax difference with the aggregate functions. The functions COUNT, SUM, and AVG all take an argument related to a count. T-SQL allows the use of DISTINCT before these argument values so that rows are counted only if the values are different from other rows. The SQL standard doesn't allow for the use of DISTINCT in these functions.

Standard SQL:
SELECT COUNT(col) FROM tab;

T-SQL:
SELECT COUNT(col) FROM tab;

SELECT COUNT(DISTINCT col) FROM tab;

But in T-SQL we don’t find a population covariance function: COVAR_POP(x,y), which is defined in the SQL standard.

#7 Retrieving Parts of Dates and Times

Most relational database systems deliver many functions to operate on dates and times.

In standard SQL, the EXTRACT(YEAR FROM x) function and similar functions to select parts of dates are different from the T-SQL functions like YEAR(x) or DATEPART(year, x).

There is also a difference in getting the current date and time. Standard SQL allows you to get the current date with the CURRENT_DATE function, but in MS SQL Server, there is not a similar function, so we have to use the GETDATE function as an argument in the CAST function to convert to a DATE data type.

#8 Operating on Strings

Using functions to operate on strings is also different between the SQL standard and T-SQL. The main difference is found in removing trailing and leading spaces from a string. In standard SQL, there is the TRIM function, but in T-SQL, there are several related functions: TRIM (removing trailing and leading spaces), LTRIM (removing leading spaces), and RTRIM (removing trailing spaces).

Another very-often-used string function is SUBSTRING.

The standard SQL syntax for the SUBSTRING function looks like:

SUBSTRING(str FROM start [FOR len])

but in T-SQL, the syntax of this function looks like:

SUBSTRING(str, start, length)

There are reasons sometimes to add values coming from other columns and/or additional strings. Standard SQL enables the following syntax to do this:

As you can see, this syntax makes use of the || operator to add one string to another.

But the equivalent operator in T-SQL is the plus sign character. Look at this example:

SELECT col1 + col2  FROM tab;

In SQL Server, we also have the possibility to use the CONCAT function concatenates a list of strings:

SELECT CONCAT(col1, str1, col2, ...)  FROM tab;

We can also repeat one character several times. Standard SQL defines the function REPEAT(str, n) to do this. Transact-SQL provides the REPLICATE function. For example:

SELECT  REPLICATE(str, x);

where x indicates how many times to repeat the string or character.

#9 Inequality Operator

During filtering records in a SELECT statement, sometimes we have to use an inequality operator. Standard SQL defines <> as this operator, while T-SQL allows for both the standard operator and the != operator:

SELECT col3 FROM tab WHERE col1 != col2;
#10 ISNULL Function

In T-SQL, we have the ability to replace NULL values coming from a column using the ISNULL function. This is a function that is specific to T-SQL and is not in the SQL standard.

SELECT ISNULL(col1) FROM tab;
Which Parts of DML Syntax Are Different?

In T-SQL, the basic syntax of DELETE, UPDATE, and INSERT queries is the same as the SQL standard, but differences appear in more advanced queries. Let’s look at them.

#11 OUTPUT Keyword

The OUTPUT keyword occurs in DELETE, UPDATE, and INSERT statements. It is not defined in standard SQL.

Using T-SQL, we can see extra information returned by a query. It returns both old and new values in UPDATE or the values added using INSERT or deleted using DELETE. To see this information, we have to use prefixes in INSERT, UPDATE, and DELETE.

UPDATE tab SET col='new value'
OUTPUT Deleted.col, Inserted.col;

We see the result of changing records with the previous and new values in an updated column. The SQL standard does not support this feature.

#12 Syntax for INSERT INTO ... SELECT

Another structure of an INSERT query is INSERT INTO … SELECT. T-SQL allows you to insert data from another table into a destination table. Look at this query:

INSERT INTO tab SELECT col1,col2,... FROM tab_source;

It is not a standard feature but a feature characteristic of SQL Server.

#13 FROM Clause in DELETE and UPDATE

SQL Server provides extended syntax of the UPDATE and DELETE with FROM clauses. You can use DELETE with FROM to use the rows from one table to remove corresponding rows in another table by referring to a primary key and a foreign key. Similarly, you can use UPDATE with FROM update rows from one table by referring to the rows of another table using common values (primary key in one table and foreign key in second, e.g. the same city name). Here is an example:

DELETE FROM Book
FROM Author
WHERE Author.Id=Book.AuthorId AND Author.Name IS NULL;

UPDATE Book
SET Book.Price=Book.Price*0.2
FROM Author
WHERE Book.AuthorId=Author.Id AND Author.Id=12;

The SQL standard doesn’t provide this syntax.

#14 INSERT, UPDATE, and DELETE With JOIN

You can also use INSERT, UPDATE, and DELETE using JOIN to connect to another table. An example of this is:

DELETE ItemOrder FROM ItemOrder
JOIN Item ON ItemOrder.ItemId=Item.Id
WHERE YEAR(Item.DeliveredDate) <= 2017;

This feature is not in the SQL standard.

Summary

This article does not cover all the issues about syntax differences between the SQL standard and T-SQL using the MS SQL Server system. However, this guide helps point out some basic features characteristic only of Transact-SQL and what SQL standard syntax isn’t implemented by MS SQL Server.

Thanks for reading. If you liked this post, share it with all of your programming buddies!

Originally published on https://dzone.com