1661876880
Многие из нас испытали на себе мощь скорости и эффективности, обеспечиваемую централизацией вычислений в облачном хранилище данных. Хотя это правда, многие из нас также осознали, что, как и у всего, у этой ценности есть свои недостатки.
Один из основных недостатков этого подхода заключается в том, что вы должны изучать и выполнять запросы на разных языках, особенно на SQL. Хотя написание SQL быстрее и дешевле, чем создание вторичной инфраструктуры для запуска Python (на вашем ноутбуке или офисных серверах), оно связано с множеством различных сложностей в зависимости от того, какую информацию аналитик данных хочет извлечь из облачного хранилища. Переход на облачные хранилища данных повышает полезность сложного SQL по сравнению с Python. Пройдя через этот опыт, я решил записать конкретные преобразования, которые наиболее болезненны для изучения и выполнения в SQL, и предоставить фактический SQL, необходимый для облегчения этой боли для моих читателей.
Чтобы помочь вам в рабочем процессе, вы заметите, что я привожу примеры структуры данных до и после выполнения преобразования, чтобы вы могли следить и проверять свою работу. Я также предоставил фактический SQL, необходимый для выполнения каждого из 5 самых сложных преобразований. Вам понадобится новый SQL для выполнения преобразования в нескольких проектах по мере изменения ваших данных. Мы предоставили ссылки на динамический SQL для каждого преобразования, чтобы вы могли продолжать получать SQL, необходимый для вашего анализа, по мере необходимости!
Неясно, откуда возник термин «позвонок финика», но даже те, кто не знает этого термина, вероятно, знакомы с тем, что это такое.
Представьте, что вы анализируете свои ежедневные данные о продажах, и это выглядит так:
продажа_дата | товар | продажи |
2022-04-14 | А | сорок шесть |
2022-04-14 | Б | 409 |
2022-04-15 | А | 17 |
2022-04-15 | Б | 480 |
2022-04-18 | А | 65 |
2022-04-19 | А | 45 |
2022-04-19 | Б | 411 |
16 и 17 числа продаж не было, поэтому строки полностью отсутствуют. Если бы мы пытались рассчитать средние дневные продажи или построить модель прогноза временных рядов, этот формат был бы серьезной проблемой. Что нам нужно сделать, так это вставить строки для пропущенных дней.
Вот основная концепция:
WITH GLOBAL_SPINE AS (
SELECT
ROW_NUMBER() OVER (
ORDER BY
NULL
) as INTERVAL_ID,
DATEADD(
'day',
(INTERVAL_ID - 1),
'2020-01-01T00:00' :: timestamp_ntz
) as SPINE_START,
DATEADD(
'day', INTERVAL_ID, '2020-01-01T00:00' :: timestamp_ntz
) as SPINE_END
FROM
TABLE (
GENERATOR(ROWCOUNT => 1097)
)
),
GROUPS AS (
SELECT
product,
MIN(sales_date) AS LOCAL_START,
MAX(sales_date) AS LOCAL_END
FROM
My_First_Table
GROUP BY
product
),
GROUP_SPINE AS (
SELECT
product,
SPINE_START AS GROUP_START,
SPINE_END AS GROUP_END
FROM
GROUPS G CROSS
JOIN LATERAL (
SELECT
SPINE_START,
SPINE_END
FROM
GLOBAL_SPINE S
WHERE
S.SPINE_START >= G.LOCAL_START
)
)
SELECT
G.product AS GROUP_BY_product,
GROUP_START,
GROUP_END,
T.*
FROM
GROUP_SPINE G
LEFT JOIN My_First_Table T ON sales_date >= G.GROUP_START
AND sales_date < G.GROUP_END
AND G.product = T.product;
Конечный результат будет выглядеть так:
продажа_дата | товар | продажи |
2022-04-14 | А | сорок шесть |
2022-04-14 | Б | 409 |
2022-04-15 | А | 17 |
2022-04-15 | Б | 480 |
2022-04-16 | А | 0 |
2022-04-16 | Б | 0 |
2022-04-17 | А | 0 |
2022-04-17 | Б | 0 |
2022-04-18 | А | 65 |
2022-04-18 | Б | 0 |
2022-04-19 | А | 45 |
2022-04-19 | Б | 411 |
Иногда при анализе требуется реструктурировать таблицу. Например, у нас может быть список учащихся, предметов и оценок, но мы хотим разбить предметы по каждому столбцу. Мы все знаем и любим Excel из-за его сводных таблиц. Но пробовали ли вы когда-нибудь сделать это в SQL? Мало того, что каждая база данных имеет раздражающие различия в том, как поддерживается PIVOT, так еще и синтаксис неинтуитивен и легко забывается.
До:
Ученик | Предмет | Оценка |
Джаред | Математика | шестьдесят один |
Джаред | География | девяносто четыре |
Джаред | Физ-ра | 98 |
Патрик | Математика | 99 |
Патрик | География | девяносто три |
Патрик | Физ-ра | 4 |
SELECT Student, MATHEMATICS, GEOGRAPHY, PHYS_ED
FROM ( SELECT Student, Grade, Subject FROM skool)
PIVOT ( AVG ( Grade ) FOR Subject IN ( 'Mathematics', 'Geography', 'Phys Ed' ) ) as p
( Student, MATHEMATICS, GEOGRAPHY, PHYS_ED );
Результат:
Ученик | Математика | География | Физ-ра |
Джаред | шестьдесят один | девяносто четыре | 98 |
Патрик | 99 | девяносто три | 4 |
Это не обязательно сложно, но требует много времени. Большинство специалистов по обработке и анализу данных не рассматривают возможность использования горячего кодирования в SQL. Хотя синтаксис прост, они скорее предпочтут перенести данные из хранилища данных, чем выполнять утомительную задачу написания 26-строчного оператора CASE. Мы их не виним!
Однако мы рекомендуем воспользоваться преимуществом вашего хранилища данных и его вычислительной мощностью. Вот пример использования STATE в качестве столбца для горячего кодирования.
До:
детское имя | Состояние | Кол-во |
Алиса | АЛ | 156 |
Алиса | А ТАКЖЕ | 146 |
Алиса | Что ж | 654 |
… | … | … |
Зельда | Нью-Йорк | 417 |
Зельда | АЛ | 261 |
Зельда | СО | 321 |
SELECT *,
CASE WHEN State = 'AL' THEN 1 ELSE 0 END as STATE_AL,
CASE WHEN State = 'AK' THEN 1 ELSE 0 END as STATE_AK,
CASE WHEN State = 'AZ' THEN 1 ELSE 0 END as STATE_AZ,
CASE WHEN State = 'AR' THEN 1 ELSE 0 END as STATE_AR,
CASE WHEN State = 'AS' THEN 1 ELSE 0 END as STATE_AS,
CASE WHEN State = 'CA' THEN 1 ELSE 0 END as STATE_CA,
CASE WHEN State = 'CO' THEN 1 ELSE 0 END as STATE_CO,
CASE WHEN State = 'CT' THEN 1 ELSE 0 END as STATE_CT,
CASE WHEN State = 'DC' THEN 1 ELSE 0 END as STATE_DC,
CASE WHEN State = 'FL' THEN 1 ELSE 0 END as STATE_FL,
CASE WHEN State = 'GA' THEN 1 ELSE 0 END as STATE_GA,
CASE WHEN State = 'HI' THEN 1 ELSE 0 END as STATE_HI,
CASE WHEN State = 'ID' THEN 1 ELSE 0 END as STATE_ID,
CASE WHEN State = 'IL' THEN 1 ELSE 0 END as STATE_IL,
CASE WHEN State = 'IN' THEN 1 ELSE 0 END as STATE_IN,
CASE WHEN State = 'IA' THEN 1 ELSE 0 END as STATE_IA,
CASE WHEN State = 'KS' THEN 1 ELSE 0 END as STATE_KS,
CASE WHEN State = 'KY' THEN 1 ELSE 0 END as STATE_KY,
CASE WHEN State = 'LA' THEN 1 ELSE 0 END as STATE_LA,
CASE WHEN State = 'ME' THEN 1 ELSE 0 END as STATE_ME,
CASE WHEN State = 'MD' THEN 1 ELSE 0 END as STATE_MD,
CASE WHEN State = 'MA' THEN 1 ELSE 0 END as STATE_MA,
CASE WHEN State = 'MI' THEN 1 ELSE 0 END as STATE_MI,
CASE WHEN State = 'MN' THEN 1 ELSE 0 END as STATE_MN,
CASE WHEN State = 'MS' THEN 1 ELSE 0 END as STATE_MS,
CASE WHEN State = 'MO' THEN 1 ELSE 0 END as STATE_MO,
CASE WHEN State = 'MT' THEN 1 ELSE 0 END as STATE_MT,
CASE WHEN State = 'NE' THEN 1 ELSE 0 END as STATE_NE,
CASE WHEN State = 'NV' THEN 1 ELSE 0 END as STATE_NV,
CASE WHEN State = 'NH' THEN 1 ELSE 0 END as STATE_NH,
CASE WHEN State = 'NJ' THEN 1 ELSE 0 END as STATE_NJ,
CASE WHEN State = 'NM' THEN 1 ELSE 0 END as STATE_NM,
CASE WHEN State = 'NY' THEN 1 ELSE 0 END as STATE_NY,
CASE WHEN State = 'NC' THEN 1 ELSE 0 END as STATE_NC,
CASE WHEN State = 'ND' THEN 1 ELSE 0 END as STATE_ND,
CASE WHEN State = 'OH' THEN 1 ELSE 0 END as STATE_OH,
CASE WHEN State = 'OK' THEN 1 ELSE 0 END as STATE_OK,
CASE WHEN State = 'OR' THEN 1 ELSE 0 END as STATE_OR,
CASE WHEN State = 'PA' THEN 1 ELSE 0 END as STATE_PA,
CASE WHEN State = 'RI' THEN 1 ELSE 0 END as STATE_RI,
CASE WHEN State = 'SC' THEN 1 ELSE 0 END as STATE_SC,
CASE WHEN State = 'SD' THEN 1 ELSE 0 END as STATE_SD,
CASE WHEN State = 'TN' THEN 1 ELSE 0 END as STATE_TN,
CASE WHEN State = 'TX' THEN 1 ELSE 0 END as STATE_TX,
CASE WHEN State = 'UT' THEN 1 ELSE 0 END as STATE_UT,
CASE WHEN State = 'VT' THEN 1 ELSE 0 END as STATE_VT,
CASE WHEN State = 'VA' THEN 1 ELSE 0 END as STATE_VA,
CASE WHEN State = 'WA' THEN 1 ELSE 0 END as STATE_WA,
CASE WHEN State = 'WV' THEN 1 ELSE 0 END as STATE_WV,
CASE WHEN State = 'WI' THEN 1 ELSE 0 END as STATE_WI,
CASE WHEN State = 'WY' THEN 1 ELSE 0 END as STATE_WY
FROM BABYTABLE;
Результат:
детское имя | Состояние | Состояние_AL | State_AK | … | State_CO | Кол-во |
Алиса | АЛ | первый | 0 | … | 0 | 156 |
Алиса | А ТАКЖЕ | 0 | первый | … | 0 | 146 |
Алиса | Что ж | 0 | 0 | … | 0 | 654 |
… | … | … | … | |||
Зельда | Нью-Йорк | 0 | 0 | … | 0 | 417 |
Зельда | АЛ | первый | 0 | … | 0 | 261 |
Зельда | СО | 0 | 0 | … | первый | 321 |
При анализе потребительской корзины или поиске правил ассоциации первым шагом часто является форматирование данных для объединения каждой транзакции в одну запись. Это может быть сложно для вашего ноутбука, но ваше хранилище данных предназначено для эффективной обработки этих данных.
Типичные данные транзакции:
НОМЕР ЗАКАЗА | ЗАКАЗЧИК | РУССКИЙPRODUCTNAME | СПИСОК ЦЕН | МАССА | ДАТА ЗАКАЗА |
SO51247 | 11249 | Гора-200 Черный | 2294,99 | 23,77 | 1 января 2013 г. |
SO51247 | 11249 | Бутылка с водой - 30 унций. | 4,99 | 1 января 2013 г. | |
SO51247 | 11249 | Горная флягодержатель | 9,99 | 1 января 2013 г. | |
SO51246 | 25625 | Шлем Спорт-100 | 34,99 | 31.12.2012 | |
SO51246 | 25625 | Бутылка с водой - 30 унций. | 4,99 | 31.12.2012 | |
SO51246 | 25625 | Дорожная флягодержатель | 8,99 | 31.12.2012 | |
SO51246 | 25625 | Туринг-1000 Синий | 2384.07 | 25.42 | 31.12.2012 |
WITH order_detail as (
SELECT
SALESORDERNUMBER,
listagg(ENGLISHPRODUCTNAME, ', ') WITHIN group (
order by
ENGLISHPRODUCTNAME
) as ENGLISHPRODUCTNAME_listagg,
COUNT(ENGLISHPRODUCTNAME) as num_products
FROM
transactions
GROUP BY
SALESORDERNUMBER
)
SELECT
ENGLISHPRODUCTNAME_listagg,
count(SALESORDERNUMBER) as NumTransactions
FROM
order_detail
where
num_products > 1
GROUP BY
ENGLISHPRODUCTNAME_listagg
order by
count(SALESORDERNUMBER) desc;
Результат:
NUMTRANSACTIONS | АНГЛИЙСКИЙ PRODUCTNAME_LISTAGG |
207 | Mountain Bottle Cage, Бутылка для воды - 30 унций. |
200 | Камера горной покрышки, комплект заплат/8 заплат |
142 | Шоссейная шина LL, комплект заплат/8 заплат |
137 | Комплект заплат/8 заплат, камера дорожной покрышки |
135 | Комплект заплат/8 заплат, камера покрышки Touring |
132 | HL Mountain Tire, камера для горных шин, набор заплат/8 заплат |
Агрегации временных рядов используются не только учеными данных, но и для аналитики. Что делает их сложными, так это то, что оконные функции требуют правильного форматирования данных.
Например, если вы хотите рассчитать среднюю сумму продаж за последние 14 дней, оконные функции требуют, чтобы все данные о продажах были разбиты на одну строку в день. К сожалению, любой, кто раньше работал с данными о продажах, знает, что обычно они хранятся на уровне транзакций. Здесь пригодится агрегация временных рядов. Вы можете создавать агрегированные исторические показатели без переформатирования всего набора данных. Это также удобно, если мы хотим добавить несколько метрик одновременно:
Если бы вы хотели использовать оконные функции, каждую метрику нужно было бы построить независимо в несколько шагов.
Лучший способ справиться с этим — использовать общие табличные выражения (CTE) для определения каждого из предварительно агрегированных исторических окон.
Например:
ID транзакции | Пользовательский ИД | Тип продукта | Сумма покупки | Дата сделки |
65432 | 101 | Бакалея | 101.14 | 2022-03-01 |
65493 | 101 | Бакалея | 98,45 | 2022-04-30 |
65494 | 101 | Автомобильный | 239,98 | 2022-05-01 |
66789 | 101 | Бакалея | 86,55 | 2022-05-22 |
66981 | 101 | Аптека | 14 | 2022-06-15 |
67145 | 101 | Бакалея | 93,12 | 2022-06-22 |
WITH BASIC_OFFSET_14DAY AS (
SELECT
A.CustomerID,
A.TransactionDate,
AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST14DAY,
MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST14DAY,
COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST14DAY
FROM
My_First_Table A
INNER JOIN My_First_Table B ON A.CustomerID = B.CustomerID
AND 1 = 1
WHERE
B.TransactionDate >= DATEADD(day, -14, A.TransactionDate)
AND B.TransactionDate <= A.TransactionDate
GROUP BY
A.CustomerID,
A.TransactionDate
),
BASIC_OFFSET_90DAY AS (
SELECT
A.CustomerID,
A.TransactionDate,
AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST90DAY,
MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST90DAY,
COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST90DAY
FROM
My_First_Table A
INNER JOIN My_First_Table B ON A.CustomerID = B.CustomerID
AND 1 = 1
WHERE
B.TransactionDate >= DATEADD(day, -90, A.TransactionDate)
AND B.TransactionDate <= A.TransactionDate
GROUP BY
A.CustomerID,
A.TransactionDate
),
BASIC_OFFSET_180DAY AS (
SELECT
A.CustomerID,
A.TransactionDate,
AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST180DAY,
MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST180DAY,
COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST180DAY
FROM
My_First_Table A
INNER JOIN My_First_Table B ON A.CustomerID = B.CustomerID
AND 1 = 1
WHERE
B.TransactionDate >= DATEADD(day, -180, A.TransactionDate)
AND B.TransactionDate <= A.TransactionDate
GROUP BY
A.CustomerID,
A.TransactionDate
)
SELECT
src.*,
BASIC_OFFSET_14DAY.AVG_PURCHASEAMOUNT_PAST14DAY,
BASIC_OFFSET_14DAY.MAX_PURCHASEAMOUNT_PAST14DAY,
BASIC_OFFSET_14DAY.COUNT_DISTINCT_TRANSACTIONID_PAST14DAY,
BASIC_OFFSET_90DAY.AVG_PURCHASEAMOUNT_PAST90DAY,
BASIC_OFFSET_90DAY.MAX_PURCHASEAMOUNT_PAST90DAY,
BASIC_OFFSET_90DAY.COUNT_DISTINCT_TRANSACTIONID_PAST90DAY,
BASIC_OFFSET_180DAY.AVG_PURCHASEAMOUNT_PAST180DAY,
BASIC_OFFSET_180DAY.MAX_PURCHASEAMOUNT_PAST180DAY,
BASIC_OFFSET_180DAY.COUNT_DISTINCT_TRANSACTIONID_PAST180DAY
FROM
My_First_Table src
LEFT OUTER JOIN BASIC_OFFSET_14DAY ON BASIC_OFFSET_14DAY.TransactionDate = src.TransactionDate
AND BASIC_OFFSET_14DAY.CustomerID = src.CustomerID
LEFT OUTER JOIN BASIC_OFFSET_90DAY ON BASIC_OFFSET_90DAY.TransactionDate = src.TransactionDate
AND BASIC_OFFSET_90DAY.CustomerID = src.CustomerID
LEFT OUTER JOIN BASIC_OFFSET_180DAY ON BASIC_OFFSET_180DAY.TransactionDate = src.TransactionDate
AND BASIC_OFFSET_180DAY.CustomerID = src.CustomerID;
Результат:
ID транзакции | Пользовательский ИД | Тип продукта | Сумма покупки | Дата сделки | Средние продажи за последние 14 дней | Максимальная покупка за последние 6 месяцев | Подсчет различных типов продуктов за последние 90 дней |
65432 | 101 | Бакалея | 101.14 | 2022-03-01 | 101.14 | 101.14 | первый |
65493 | 101 | Бакалея | 98,45 | 2022-04-30 | 98,45 | 101.14 | 2 |
65494 | 101 | Автомобильный | 239,98 | 2022-05-01 | 169,21 | 239,98 | 2 |
66789 | 101 | Бакалея | 86,55 | 2022-05-22 | 86,55 | 239,98 | 2 |
66981 | 101 | Аптека | 14 | 2022-06-15 | 14 | 239,98 | 3 |
67145 | 101 | Бакалея | 93,12 | 2022-06-22 | 53,56 | 239,98 | 3 |
Я надеюсь, что эта статья поможет пролить свет на различные проблемы, с которыми сталкивается специалист по работе с данными при работе с современным стеком данных. SQL — палка о двух концах, когда речь идет о запросах к облачному хранилищу. Хотя централизация вычислений в облачном хранилище данных увеличивает скорость, иногда требуются дополнительные навыки работы с SQL. Я надеюсь, что эта часть помогла ответить на вопросы и предоставила синтаксис и предысторию, необходимые для решения этих проблем.
Источник: https://www.kdnuggets.com
1594369800
SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.
Models for SQL exist. In any case, the SQL that can be utilized on every last one of the major RDBMS today is in various flavors. This is because of two reasons:
1. The SQL order standard is genuinely intricate, and it isn’t handy to actualize the whole standard.
2. Every database seller needs an approach to separate its item from others.
Right now, contrasts are noted where fitting.
#programming books #beginning sql pdf #commands sql #download free sql full book pdf #introduction to sql pdf #introduction to sql ppt #introduction to sql #practical sql pdf #sql commands pdf with examples free download #sql commands #sql free bool download #sql guide #sql language #sql pdf #sql ppt #sql programming language #sql tutorial for beginners #sql tutorial pdf #sql #structured query language pdf #structured query language ppt #structured query language
1596441660
When you develop large chunks of T-SQL code with the help of the SQL Server Management Studio tool, it is essential to test the “Live” behavior of your code by making sure that each small piece of code works fine and being able to allocate any error message that may cause a failure within that code.
The easiest way to perform that would be to use the T-SQL debugger feature, which used to be built-in over the SQL Server Management Studio tool. But since the T-SQL debugger feature was removed completely from SQL Server Management Studio 18 and later editions, we need a replacement for that feature. This is because we cannot keep using the old versions of SSMS just to support the T-SQL Debugger feature without “enjoying” the new features and bug fixes that are released in the new SSMS versions.
If you plan to wait for SSMS to bring back the T-SQL Debugger feature, vote in the Put Debugger back into SSMS 18 to ask Microsoft to reintroduce it.
As for me, I searched for an alternative tool for a T-SQL Debugger SSMS built-in feature and found that Devart company rolled out a new T-SQL Debugger feature to version 6.4 of SQL – Complete tool. SQL Complete is an add-in for Visual Studio and SSMS that offers scripts autocompletion capabilities, which help develop and debug your SQL database project.
The SQL Debugger feature of SQL Complete allows you to check the execution of your scripts, procedures, functions, and triggers step by step by adding breakpoints to the lines where you plan to start, suspend, evaluate, step through, and then to continue the execution of your script.
You can download SQL Complete from the dbForge Download page and install it on your machine using a straight-forward installation wizard. The wizard will ask you to specify the installation path for the SQL Complete tool and the versions of SSMS and Visual Studio that you plan to install the SQL Complete on, as an add-in, from the versions that are installed on your machine, as shown below:
Once SQL Complete is fully installed on your machine, the dbForge SQL Complete installation wizard will notify you of whether the installation was completed successfully or the wizard faced any specific issue that you can troubleshoot and fix easily. If there are no issues, the wizard will provide you with an option to open the SSMS tool and start using the SQL Complete tool, as displayed below:
When you open SSMS, you will see a new “Debug” tools menu, under which you can navigate the SQL Debugger feature options. Besides, you will see a list of icons that will be used to control the debug mode of the T-SQL query at the leftmost side of the SSMS tool. If you cannot see the list, you can go to View -> Toolbars -> Debugger to make these icons visible.
During the debugging session, the SQL Debugger icons will be as follows:
The functionality of these icons within the SQL Debugger can be summarized as:
#sql server #sql #sql debugger #sql server #sql server stored procedure #ssms #t-sql queries
1596448980
Let’s say the chief credit and collections officer asks you to list down the names of people, their unpaid balances per month, and the current running balance and wants you to import this data array into Excel. The purpose is to analyze the data and come up with an offer making payments lighter to mitigate the effects of the COVID19 pandemic.
Do you opt to use a query and a nested subquery or a join? What decision will you make?
Before we do a deep dive into syntax, performance impact, and caveats, why not define a subquery first?
In the simplest terms, a subquery is a query within a query. While a query that embodies a subquery is the outer query, we refer to a subquery as the inner query or inner select. And parentheses enclose a subquery similar to the structure below:
SELECT
col1
,col2
,(subquery) as col3
FROM table1
[JOIN table2 ON table1.col1 = table2.col2]
WHERE col1 <operator> (subquery)
We are going to look upon the following points in this post:
As is customary, we provide examples and illustrations to enhance understanding. But bear in mind that the main focus of this post is on subqueries in SQL Server.
Now, let’s get started.
For one thing, subqueries are categorized based on their dependency on the outer query.
Let me describe what a self-contained subquery is.
Self-contained subqueries (or sometimes referred to as non-correlated or simple subqueries) are independent of the tables in the outer query. Let me illustrate this:
-- Get sales orders of customers from Southwest United States
-- (TerritoryID = 4)
USE [AdventureWorks]
GO
SELECT CustomerID, SalesOrderID
FROM Sales.SalesOrderHeader
WHERE CustomerID IN (SELECT [CustomerID]
FROM [AdventureWorks].[Sales].[Customer]
WHERE TerritoryID = 4)
As demonstrated in the above code, the subquery (enclosed in parentheses below) has no references to any column in the outer query. Additionally, you can highlight the subquery in SQL Server Management Studio and execute it without getting any runtime errors.
Which, in turn, leads to easier debugging of self-contained subqueries.
The next thing to consider is correlated subqueries. Compared to its self-contained counterpart, this one has at least one column being referenced from the outer query. To clarify, I will provide an example:
USE [AdventureWorks]
GO
SELECT DISTINCT a.LastName, a.FirstName, b.BusinessEntityID
FROM Person.Person AS p
JOIN HumanResources.Employee AS e ON p.BusinessEntityID = e.BusinessEntityID
WHERE 1262000.00 IN
(SELECT [SalesQuota]
FROM Sales.SalesPersonQuotaHistory spq
WHERE p.BusinessEntityID = spq.BusinessEntityID)
Were you attentive enough to notice the reference to BusinessEntityID from the Person table? Well done!
Once a column from the outer query is referenced in the subquery, it becomes a correlated subquery. One more point to consider: if you highlight a subquery and execute it, an error will occur.
And yes, you are absolutely right: this makes correlated subqueries pretty harder to debug.
To make debugging possible, follow these steps:
Isolating the subquery for debugging will make it look like this:
SELECT [SalesQuota]
FROM Sales.SalesPersonQuotaHistory spq
WHERE spq.BusinessEntityID = <constant value>
Now, let’s dig a little deeper into the output of subqueries.
Well, first, let’s think of what returned values can we expect from SQL subqueries.
In fact, there are 3 possible outcomes:
Let’s start with single-valued output. This type of subquery can appear anywhere in the outer query where an expression is expected, like the WHERE clause.
-- Output a single value which is the maximum or last TransactionID
USE [AdventureWorks]
GO
SELECT TransactionID, ProductID, TransactionDate, Quantity
FROM Production.TransactionHistory
WHERE TransactionID = (SELECT MAX(t.TransactionID)
FROM Production.TransactionHistory t)
When you use a MAX() function, you retrieve a single value. That’s exactly what happened to our subquery above. Using the equal (=) operator tells SQL Server that you expect a single value. Another thing: if the subquery returns multiple values using the equals (=) operator, you get an error, similar to the one below:
Msg 512, Level 16, State 1, Line 20
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Next, we examine the multi-valued output. This kind of subquery returns a list of values with a single column. Additionally, operators like IN and NOT IN will expect one or more values.
-- Output multiple values which is a list of customers with lastnames that --- start with 'I'
USE [AdventureWorks]
GO
SELECT [SalesOrderID], [OrderDate], [ShipDate], [CustomerID]
FROM Sales.SalesOrderHeader
WHERE [CustomerID] IN (SELECT c.[CustomerID] FROM Sales.Customer c
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.lastname LIKE N'I%' AND p.PersonType='SC')
And last but not least, why not delve into whole table outputs.
-- Output a table of values based on sales orders
USE [AdventureWorks]
GO
SELECT [ShipYear],
COUNT(DISTINCT [CustomerID]) AS CustomerCount
FROM (SELECT YEAR([ShipDate]) AS [ShipYear], [CustomerID]
FROM Sales.SalesOrderHeader) AS Shipments
GROUP BY [ShipYear]
ORDER BY [ShipYear]
Have you noticed the FROM clause?
Instead of using a table, it used a subquery. This is called a derived table or a table subquery.
And now, let me present you some ground rules when using this sort of query:
In this case, a derived table has the benefits of a physical table. That’s why in our example, we can use COUNT() in one of the columns of the derived table.
That’s about all regarding subquery outputs. But before we get any further, you may have noticed that the logic behind the example for multiple values and others as well can also be done using a JOIN.
-- Output multiple values which is a list of customers with lastnames that start with 'I'
USE [AdventureWorks]
GO
SELECT o.[SalesOrderID], o.[OrderDate], o.[ShipDate], o.[CustomerID]
FROM Sales.SalesOrderHeader o
INNER JOIN Sales.Customer c on o.CustomerID = c.CustomerID
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.LastName LIKE N'I%' AND p.PersonType = 'SC'
In fact, the output will be the same. But which one performs better?
Before we get into that, let me tell you that I have dedicated a section to this hot topic. We’ll examine it with complete execution plans and have a look at illustrations.
So, bear with me for a moment. Let’s discuss another way to place your subqueries.
#sql server #sql query #sql server #sql subqueries #t-sql statements #sql
1621850444
When working in the SQL Server, we may have to check some other databases other than the current one which we are working. In that scenario we may not be sure that does we have access to those Databases?. In this article we discuss the list of databases that are available for the current logged user in SQL Server
#sql server #available databases for current user #check database has access #list of available database #sql #sql query #sql server database #sql tips #sql tips and tricks #tips
1603760400
This article will introduce the concept of SQL recursive. Recursive CTE is a really cool. We will see that it can often simplify our code, and avoid a cascade of SQL queries!
The recursive queries are used to query hierarchical data. It avoids a cascade of SQL queries, you can only do one query to retrieve the hierarchical data.
First, what is a CTE? A CTE (Common Table Expression) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. For example, you can use CTE when, in a query, you will use the same subquery more than once.
A recursive CTE is one having a subquery that refers to its own name!
Recursive CTE is defined in the SQL standard.
A recursive CTE has this structure:
In this example, we use hierarchical data. Each row can have zero or one parent. And it parent can also have a parent etc.
Create table test (id integer, parent_id integer);
insert into test (id, parent_id) values (1, null);
insert into test (id, parent_id) values (11, 1);
insert into test (id, parent_id) values (111, 11);
insert into test (id, parent_id) values (112, 11);
insert into test (id, parent_id) values (12, 1);
insert into test (id, parent_id) values (121, 12);
For example, the row with id 111 has as ancestors: 11 and 1.
Before knowing the recursive CTE, I was doing several queries to get all the ancestors of a row.
For example, to retrieve all the ancestors of the row with id 111.
While (has parent)
Select id, parent_id from test where id = X
With recursive CTE, we can retrieve all ancestors of a row with only one SQL query :)
WITH RECURSIVE cte_test AS (
SELECT id, parent_id FROM test WHERE id = 111
UNION
SELECT test.id, test.parent_id FROM test JOIN cte_test ON cte_test.id = test.parent_id
) SELECT * FROM cte_test
Explanations:
It indicates we will make recursive
It is the initial query.
It is the recursive expression! We make a jointure with the current CTE!
Replay this example here
#sql #database #sql-server #sql-injection #writing-sql-queries #sql-beginner-tips #better-sql-querying-tips #sql-top-story