Pereira Dias

Pereira Dias

1573090861

Querying MongoDB Like an SQL DB when using Aggregation Pipeline

What Are Aggregations?

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result.

In the db.collection.aggregate method and [db.aggregate](https://docs.mongodb.com/manual/reference/method/db.aggregate/#db.aggregate) method, pipeline stages appear in an array. Documents pass through the stages in sequence. We will go through some of the stages to achieve a relational DB like results.

$match (WHERE)

Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.

It has the following prototype:

{ $match: { <query> } }

It is the equivalent of WHERE in SQL queries. Let us take an example to make things clear. This example uses a collection named articles with the following documents:

{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b257"), "author" : "ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25a"), "author" : "li", "score" : 94, "views" : 999 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25b"), "author" : "ty", "score" : 95, "views" : 1000 }

Equality match.

db.articles.aggregate(
    [ { $match : { author : "dave" } } ]
);

// Result
{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }

We can have multiple constraints inside $match, like $or, $and, etc. according to our requirements, although it has some limitations as well. You can read about it in the Mongo docs.

$skip (OFFSET)

Skips over the specified number of documents that pass into the stage and passes the remaining documents to the next stage in the pipeline. It has the following prototype:

{ $skip: <positive integer> }

In the above example, we were able to match the records related to dave. If we want to skip the few results from the beginning we would write the query as:

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $skip: 1 } 
]);

// We are skipping 1 result and we should get just this

{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }

We generally use $skip with $limit to paginate the data, let’s insert few more records into our collection and see how $skip and $limit work together in the next section.

$limit (LIMIT)

Limits the number of documents passed to the next stage in the pipeline. It has the following prototype:

{ $limit: <positive integer> }

We have added new records for dave, let’s see how the collections look with a simple $match:

db.articles.aggregate(
    [ { $match : { author : "dave" } } ]
);

{ "_id" : ObjectId("5dc1d22124a8e913bfcf4f5f"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("5dc1d53924a8e913bfcf4f65"), "author" : "dave", "score" : 185, "views" : 1521 }
{ "_id" : ObjectId("5dc1d54f24a8e913bfcf4f66"), "author" : "dave", "score" : 15, "views" : 21 }

We have four matching records in the example collection. Suppose you are asked to paginate the results to show two at a time, how would you go about it? Let’s see.

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $skip: 0},
    { $limit: 2}
]);

// We are not skipping any records and but limiting the records to 2

{ "_id" : ObjectId("5dc1d22124a8e913bfcf4f5f"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }

// We got the first two results, to get the next two results just update the $skip

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $skip: 2},
    { $limit: 2}
]);

// This should give two records after skipping the first two.

{ "_id" : ObjectId("5dc1d53924a8e913bfcf4f65"), "author" : "dave", "score" : 185, "views" : 1521 }
{ "_id" : ObjectId("5dc1d54f24a8e913bfcf4f66"), "author" : "dave", "score" : 15, "views" : 21 }

We are doing good so far, but suppose your manager comes up to you and asks you to sort the result by views. What are you going to do? We have $sort for that.

$sort (ORDER BY)

Sorts all input documents and returns them to the pipeline in sorted order. It has the following prototype:

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

Let us use the $sort stage in our pipeline.

1 to specify ascending order
-1 to specify descending order

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $sort: { views: 1}}
]);

// Result

{ "_id" : ObjectId("5dc1d54f24a8e913bfcf4f66"), "author" : "dave", "score" : 15, "views" : 21 }
{ "_id" : ObjectId("5dc1d22124a8e913bfcf4f5f"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("5dc1d53924a8e913bfcf4f65"), "author" : "dave", "score" : 185, "views" : 1521 }

Voilà ! The results are sorted now.

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe.

$group

Groups input documents by the specified _id expression and, for each distinct grouping, outputs a document.

The _id field of each output document contains the unique group by value. The output documents can also contain computed fields that hold the values of an accumulator expression. It has the following prototype:

{
  $group:
    {
      _id: <expression>, // Group By Expression
      <field1>: { <accumulator1> : <expression1> },
      ...
    }
}

Suppose we want to group the articles by author, in other words, the number of articles by each author, we can make use of the group stage in the pipeline. So, let’s see it live:

db.articles.aggregate([ 
    { $group : { _id: "$author", count: { $sum: 1 }}},
    { $sort: { count: 1 }}
]);

// We have grouped the articles by author ann getting the count and sorting it by count

{ "_id" : "annT", "count" : 1 }
{ "_id" : "ahn", "count" : 1 }
{ "_id" : "li", "count" : 2 }
{ "_id" : "dave", "count" : 4 }

// We can have more constraints like if we want only the results whose count is greater than 1, then we can add a $match stage in the pipeline after $group

db.articles.aggregate([ 
    { $group : { _id: "$author", count: { $sum: 1 }}},
    { $sort: { count: 1 }},
    { $match: { count : { $gt: 1 }}}
]);

{ "_id" : "li", "count" : 2 }
{ "_id" : "dave", "count" : 4 }

Let us take this grouping up a notch. Suppose we want to group by values stored in an array structure. We have something called $unwind. Let’s see how it works.

$unwind

Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.

You can pass the array field path to $unwind. When using this syntax, $unwind does not output a document if the field value is null, missing, or an empty array. It has the following prototype:

{ $unwind: <field path> }

Let us take a new collection inventory and a new record to it with the following command:

db.inventory.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })

That’s the beauty of MongoDB, you can create a new collection and each document is identical to the input document, except for the value of the sizes field which now holds a value from the original sizes array. Add a record to it without any setup.

Let us $unwind this by the sizes.

db.inventory.aggregate( [ { $unwind : "$sizes" } ] )

// Result
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }

Each document is identical to the input document except for the value of the sizes field which now holds a value from the original sizes array.

Let us take a new collection inventory2 and do a group by the size, use this command to insert more records:

db.inventory2.insertMany([
  { "_id" : 1, "item" : "ABC", price: NumberDecimal("80"), "sizes": [ "S", "M", "L"] },
  { "_id" : 2, "item" : "EFG", price: NumberDecimal("120"), "sizes" : [ ] },
  { "_id" : 3, "item" : "IJK", price: NumberDecimal("160"), "sizes": "M" },
  { "_id" : 4, "item" : "LMN" , price: NumberDecimal("10") },
  { "_id" : 5, "item" : "XYZ", price: NumberDecimal("5.75"), "sizes" : null }
])

If we unwind this, we would get something like this:

db.inventory2.aggregate( [ { $unwind: "$sizes" } ] )

// Results

{ "_id" : 1, "item" : "ABC", "price" : NumberDecimal("80"), "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "price" : NumberDecimal("80"), "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "price" : NumberDecimal("80"), "sizes" : "L" }
{ "_id" : 3, "item" : "IJK", "price" : NumberDecimal("160"), "sizes" : "M" }

// Notice it ignores the null and undefined values

db.articles.aggregate([ 
    { $unwind: { path: "$sizes" } },
    { $group: { _id: "$sizes", count: { $sum: 1 }}}
]);

// Results

{ "_id" : "M", "count" : 2 }
{ "_id" : "L", "count" : 1 }
{ "_id" : "S", "count" : 1 }

We can apply different stages to this like $match, $sort, $skip, $limit, etc. to get the desired results.

Now, let’s move on to SQL JOINS, to achieve joins in MongoDB we have $lookup.

$lookup

New in version 3.2.

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.

To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The $lookup stage passes these reshaped documents to the next stage.

There can be different join conditions but we will be looking into the most basic one, which is an equality match.

Equality match

To perform uncorrelated subqueries between two collections as well as allow other join conditions besides a single equality match. The [$lookup](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#pipe._S_lookup) stage has the following syntax:

{
   $lookup:
     {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the collection to join> ],
       as: <output array field>
     }
}

from: Specifies the collection in the same database to perform the join with.

let: Optional. Specifies variables to use in the pipeline field stages. Use the variable expressions to access the fields from the documents input to the $lookup stage.

pipeline: Specifies the pipeline to run on the joined collection. The pipeline determines the resulting documents from the joined collection. To return all documents, specify an empty pipeline [].

as: Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten.

Let us look at some examples to better understand the terminologies:

Perform a single equality join with $lookup

Create a collection orders with the following documents:

db.orders.insert([
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
])

Create another collection inventory with the following documents:

db.inventory.insert([
   { "_id" : 1, "sku" : "almonds", description: "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", description: "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", description: "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", description: "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, description: "Incomplete" },
   { "_id" : 6 }
])

The following aggregation operation on the orders collection joins the documents from orders with the documents from the inventory collection using the fields item from the orders collection and the sku field from the inventory collection:

db.orders.aggregate([
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
]);

The operation returns the following documents:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

Conclusion

This was just a basic overview of using SQL-like queries in MongoDB.

There is a lot more that can be done using many other stages that are available. The best way to have a good grasp of it is by practicing different scenarios and using them in your projects. I hope this will help. Thank you !

#Mongodb #Sql #Database #Programming

What is GEEK

Buddha Community

Querying MongoDB Like an SQL DB when using Aggregation Pipeline
Cayla  Erdman

Cayla Erdman

1594369800

Introduction to Structured Query Language SQL pdf

SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.

Models for SQL exist. In any case, the SQL that can be utilized on every last one of the major RDBMS today is in various flavors. This is because of two reasons:

1. The SQL order standard is genuinely intricate, and it isn’t handy to actualize the whole standard.

2. Every database seller needs an approach to separate its item from others.

Right now, contrasts are noted where fitting.

#programming books #beginning sql pdf #commands sql #download free sql full book pdf #introduction to sql pdf #introduction to sql ppt #introduction to sql #practical sql pdf #sql commands pdf with examples free download #sql commands #sql free bool download #sql guide #sql language #sql pdf #sql ppt #sql programming language #sql tutorial for beginners #sql tutorial pdf #sql #structured query language pdf #structured query language ppt #structured query language

Pereira Dias

Pereira Dias

1573090861

Querying MongoDB Like an SQL DB when using Aggregation Pipeline

What Are Aggregations?

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result.

In the db.collection.aggregate method and [db.aggregate](https://docs.mongodb.com/manual/reference/method/db.aggregate/#db.aggregate) method, pipeline stages appear in an array. Documents pass through the stages in sequence. We will go through some of the stages to achieve a relational DB like results.

$match (WHERE)

Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.

It has the following prototype:

{ $match: { <query> } }

It is the equivalent of WHERE in SQL queries. Let us take an example to make things clear. This example uses a collection named articles with the following documents:

{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b257"), "author" : "ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25a"), "author" : "li", "score" : 94, "views" : 999 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25b"), "author" : "ty", "score" : 95, "views" : 1000 }

Equality match.

db.articles.aggregate(
    [ { $match : { author : "dave" } } ]
);

// Result
{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }

We can have multiple constraints inside $match, like $or, $and, etc. according to our requirements, although it has some limitations as well. You can read about it in the Mongo docs.

$skip (OFFSET)

Skips over the specified number of documents that pass into the stage and passes the remaining documents to the next stage in the pipeline. It has the following prototype:

{ $skip: <positive integer> }

In the above example, we were able to match the records related to dave. If we want to skip the few results from the beginning we would write the query as:

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $skip: 1 } 
]);

// We are skipping 1 result and we should get just this

{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }

We generally use $skip with $limit to paginate the data, let’s insert few more records into our collection and see how $skip and $limit work together in the next section.

$limit (LIMIT)

Limits the number of documents passed to the next stage in the pipeline. It has the following prototype:

{ $limit: <positive integer> }

We have added new records for dave, let’s see how the collections look with a simple $match:

db.articles.aggregate(
    [ { $match : { author : "dave" } } ]
);

{ "_id" : ObjectId("5dc1d22124a8e913bfcf4f5f"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("5dc1d53924a8e913bfcf4f65"), "author" : "dave", "score" : 185, "views" : 1521 }
{ "_id" : ObjectId("5dc1d54f24a8e913bfcf4f66"), "author" : "dave", "score" : 15, "views" : 21 }

We have four matching records in the example collection. Suppose you are asked to paginate the results to show two at a time, how would you go about it? Let’s see.

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $skip: 0},
    { $limit: 2}
]);

// We are not skipping any records and but limiting the records to 2

{ "_id" : ObjectId("5dc1d22124a8e913bfcf4f5f"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }

// We got the first two results, to get the next two results just update the $skip

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $skip: 2},
    { $limit: 2}
]);

// This should give two records after skipping the first two.

{ "_id" : ObjectId("5dc1d53924a8e913bfcf4f65"), "author" : "dave", "score" : 185, "views" : 1521 }
{ "_id" : ObjectId("5dc1d54f24a8e913bfcf4f66"), "author" : "dave", "score" : 15, "views" : 21 }

We are doing good so far, but suppose your manager comes up to you and asks you to sort the result by views. What are you going to do? We have $sort for that.

$sort (ORDER BY)

Sorts all input documents and returns them to the pipeline in sorted order. It has the following prototype:

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

Let us use the $sort stage in our pipeline.

1 to specify ascending order
-1 to specify descending order

db.articles.aggregate([ 
    { $match : { author : "dave" } },
    { $sort: { views: 1}}
]);

// Result

{ "_id" : ObjectId("5dc1d54f24a8e913bfcf4f66"), "author" : "dave", "score" : 15, "views" : 21 }
{ "_id" : ObjectId("5dc1d22124a8e913bfcf4f5f"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5dc1d22f24a8e913bfcf4f60"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("5dc1d53924a8e913bfcf4f65"), "author" : "dave", "score" : 185, "views" : 1521 }

Voilà ! The results are sorted now.

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe.

$group

Groups input documents by the specified _id expression and, for each distinct grouping, outputs a document.

The _id field of each output document contains the unique group by value. The output documents can also contain computed fields that hold the values of an accumulator expression. It has the following prototype:

{
  $group:
    {
      _id: <expression>, // Group By Expression
      <field1>: { <accumulator1> : <expression1> },
      ...
    }
}

Suppose we want to group the articles by author, in other words, the number of articles by each author, we can make use of the group stage in the pipeline. So, let’s see it live:

db.articles.aggregate([ 
    { $group : { _id: "$author", count: { $sum: 1 }}},
    { $sort: { count: 1 }}
]);

// We have grouped the articles by author ann getting the count and sorting it by count

{ "_id" : "annT", "count" : 1 }
{ "_id" : "ahn", "count" : 1 }
{ "_id" : "li", "count" : 2 }
{ "_id" : "dave", "count" : 4 }

// We can have more constraints like if we want only the results whose count is greater than 1, then we can add a $match stage in the pipeline after $group

db.articles.aggregate([ 
    { $group : { _id: "$author", count: { $sum: 1 }}},
    { $sort: { count: 1 }},
    { $match: { count : { $gt: 1 }}}
]);

{ "_id" : "li", "count" : 2 }
{ "_id" : "dave", "count" : 4 }

Let us take this grouping up a notch. Suppose we want to group by values stored in an array structure. We have something called $unwind. Let’s see how it works.

$unwind

Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.

You can pass the array field path to $unwind. When using this syntax, $unwind does not output a document if the field value is null, missing, or an empty array. It has the following prototype:

{ $unwind: <field path> }

Let us take a new collection inventory and a new record to it with the following command:

db.inventory.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })

That’s the beauty of MongoDB, you can create a new collection and each document is identical to the input document, except for the value of the sizes field which now holds a value from the original sizes array. Add a record to it without any setup.

Let us $unwind this by the sizes.

db.inventory.aggregate( [ { $unwind : "$sizes" } ] )

// Result
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }

Each document is identical to the input document except for the value of the sizes field which now holds a value from the original sizes array.

Let us take a new collection inventory2 and do a group by the size, use this command to insert more records:

db.inventory2.insertMany([
  { "_id" : 1, "item" : "ABC", price: NumberDecimal("80"), "sizes": [ "S", "M", "L"] },
  { "_id" : 2, "item" : "EFG", price: NumberDecimal("120"), "sizes" : [ ] },
  { "_id" : 3, "item" : "IJK", price: NumberDecimal("160"), "sizes": "M" },
  { "_id" : 4, "item" : "LMN" , price: NumberDecimal("10") },
  { "_id" : 5, "item" : "XYZ", price: NumberDecimal("5.75"), "sizes" : null }
])

If we unwind this, we would get something like this:

db.inventory2.aggregate( [ { $unwind: "$sizes" } ] )

// Results

{ "_id" : 1, "item" : "ABC", "price" : NumberDecimal("80"), "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "price" : NumberDecimal("80"), "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "price" : NumberDecimal("80"), "sizes" : "L" }
{ "_id" : 3, "item" : "IJK", "price" : NumberDecimal("160"), "sizes" : "M" }

// Notice it ignores the null and undefined values

db.articles.aggregate([ 
    { $unwind: { path: "$sizes" } },
    { $group: { _id: "$sizes", count: { $sum: 1 }}}
]);

// Results

{ "_id" : "M", "count" : 2 }
{ "_id" : "L", "count" : 1 }
{ "_id" : "S", "count" : 1 }

We can apply different stages to this like $match, $sort, $skip, $limit, etc. to get the desired results.

Now, let’s move on to SQL JOINS, to achieve joins in MongoDB we have $lookup.

$lookup

New in version 3.2.

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.

To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The $lookup stage passes these reshaped documents to the next stage.

There can be different join conditions but we will be looking into the most basic one, which is an equality match.

Equality match

To perform uncorrelated subqueries between two collections as well as allow other join conditions besides a single equality match. The [$lookup](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#pipe._S_lookup) stage has the following syntax:

{
   $lookup:
     {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the collection to join> ],
       as: <output array field>
     }
}

from: Specifies the collection in the same database to perform the join with.

let: Optional. Specifies variables to use in the pipeline field stages. Use the variable expressions to access the fields from the documents input to the $lookup stage.

pipeline: Specifies the pipeline to run on the joined collection. The pipeline determines the resulting documents from the joined collection. To return all documents, specify an empty pipeline [].

as: Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten.

Let us look at some examples to better understand the terminologies:

Perform a single equality join with $lookup

Create a collection orders with the following documents:

db.orders.insert([
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
])

Create another collection inventory with the following documents:

db.inventory.insert([
   { "_id" : 1, "sku" : "almonds", description: "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", description: "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", description: "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", description: "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, description: "Incomplete" },
   { "_id" : 6 }
])

The following aggregation operation on the orders collection joins the documents from orders with the documents from the inventory collection using the fields item from the orders collection and the sku field from the inventory collection:

db.orders.aggregate([
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
]);

The operation returns the following documents:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

Conclusion

This was just a basic overview of using SQL-like queries in MongoDB.

There is a lot more that can be done using many other stages that are available. The best way to have a good grasp of it is by practicing different scenarios and using them in your projects. I hope this will help. Thank you !

#Mongodb #Sql #Database #Programming

Cayla  Erdman

Cayla Erdman

1596448980

The Easy Guide on How to Use Subqueries in SQL Server

Let’s say the chief credit and collections officer asks you to list down the names of people, their unpaid balances per month, and the current running balance and wants you to import this data array into Excel. The purpose is to analyze the data and come up with an offer making payments lighter to mitigate the effects of the COVID19 pandemic.

Do you opt to use a query and a nested subquery or a join? What decision will you make?

SQL Subqueries – What Are They?

Before we do a deep dive into syntax, performance impact, and caveats, why not define a subquery first?

In the simplest terms, a subquery is a query within a query. While a query that embodies a subquery is the outer query, we refer to a subquery as the inner query or inner select. And parentheses enclose a subquery similar to the structure below:

SELECT 
 col1
,col2
,(subquery) as col3
FROM table1
[JOIN table2 ON table1.col1 = table2.col2]
WHERE col1 <operator> (subquery)

We are going to look upon the following points in this post:

  • SQL subquery syntax depending on different subquery types and operators.
  • When and in what sort of statements one can use a subquery.
  • Performance implications vs. JOINs.
  • Common caveats when using SQL subqueries.

As is customary, we provide examples and illustrations to enhance understanding. But bear in mind that the main focus of this post is on subqueries in SQL Server.

Now, let’s get started.

Make SQL Subqueries That Are Self-Contained or Correlated

For one thing, subqueries are categorized based on their dependency on the outer query.

Let me describe what a self-contained subquery is.

Self-contained subqueries (or sometimes referred to as non-correlated or simple subqueries) are independent of the tables in the outer query. Let me illustrate this:

-- Get sales orders of customers from Southwest United States 
-- (TerritoryID = 4)

USE [AdventureWorks]
GO
SELECT CustomerID, SalesOrderID
FROM Sales.SalesOrderHeader
WHERE CustomerID IN (SELECT [CustomerID]
                     FROM [AdventureWorks].[Sales].[Customer]
                     WHERE TerritoryID = 4)

As demonstrated in the above code, the subquery (enclosed in parentheses below) has no references to any column in the outer query. Additionally, you can highlight the subquery in SQL Server Management Studio and execute it without getting any runtime errors.

Which, in turn, leads to easier debugging of self-contained subqueries.

The next thing to consider is correlated subqueries. Compared to its self-contained counterpart, this one has at least one column being referenced from the outer query. To clarify, I will provide an example:

USE [AdventureWorks]
GO
SELECT DISTINCT a.LastName, a.FirstName, b.BusinessEntityID
FROM Person.Person AS p
JOIN HumanResources.Employee AS e ON p.BusinessEntityID = e.BusinessEntityID
WHERE 1262000.00 IN
    (SELECT [SalesQuota]
    FROM Sales.SalesPersonQuotaHistory spq
    WHERE p.BusinessEntityID = spq.BusinessEntityID)

Were you attentive enough to notice the reference to BusinessEntityID from the Person table? Well done!

Once a column from the outer query is referenced in the subquery, it becomes a correlated subquery. One more point to consider: if you highlight a subquery and execute it, an error will occur.

And yes, you are absolutely right: this makes correlated subqueries pretty harder to debug.

To make debugging possible, follow these steps:

  • isolate the subquery.
  • replace the reference to the outer query with a constant value.

Isolating the subquery for debugging will make it look like this:

SELECT [SalesQuota]
    FROM Sales.SalesPersonQuotaHistory spq
    WHERE spq.BusinessEntityID = <constant value>

Now, let’s dig a little deeper into the output of subqueries.

Make SQL Subqueries With 3 Possible Returned Values

Well, first, let’s think of what returned values can we expect from SQL subqueries.

In fact, there are 3 possible outcomes:

  • A single value
  • Multiple values
  • Whole tables

Single Value

Let’s start with single-valued output. This type of subquery can appear anywhere in the outer query where an expression is expected, like the WHERE clause.

-- Output a single value which is the maximum or last TransactionID
USE [AdventureWorks]
GO
SELECT TransactionID, ProductID, TransactionDate, Quantity
FROM Production.TransactionHistory
WHERE TransactionID = (SELECT MAX(t.TransactionID) 
                       FROM Production.TransactionHistory t)

When you use a MAX() function, you retrieve a single value. That’s exactly what happened to our subquery above. Using the equal (=) operator tells SQL Server that you expect a single value. Another thing: if the subquery returns multiple values using the equals (=) operator, you get an error, similar to the one below:

Msg 512, Level 16, State 1, Line 20
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Multiple Values

Next, we examine the multi-valued output. This kind of subquery returns a list of values with a single column. Additionally, operators like IN and NOT IN will expect one or more values.

-- Output multiple values which is a list of customers with lastnames that --- start with 'I'

USE [AdventureWorks]
GO
SELECT [SalesOrderID], [OrderDate], [ShipDate], [CustomerID]
FROM Sales.SalesOrderHeader
WHERE [CustomerID] IN (SELECT c.[CustomerID] FROM Sales.Customer c
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.lastname LIKE N'I%' AND p.PersonType='SC')

Whole Table Values

And last but not least, why not delve into whole table outputs.

-- Output a table of values based on sales orders
USE [AdventureWorks]
GO
SELECT [ShipYear],
COUNT(DISTINCT [CustomerID]) AS CustomerCount
FROM (SELECT YEAR([ShipDate]) AS [ShipYear], [CustomerID] 
      FROM Sales.SalesOrderHeader) AS Shipments
GROUP BY [ShipYear]
ORDER BY [ShipYear]

Have you noticed the FROM clause?

Instead of using a table, it used a subquery. This is called a derived table or a table subquery.

And now, let me present you some ground rules when using this sort of query:

  • All columns in the subquery should have unique names. Much like a physical table, a derived table should have unique column names.
  • ORDER BY is not allowed unless TOP is also specified. That’s because the derived table represents a relational table where rows have no defined order.

In this case, a derived table has the benefits of a physical table. That’s why in our example, we can use COUNT() in one of the columns of the derived table.

That’s about all regarding subquery outputs. But before we get any further, you may have noticed that the logic behind the example for multiple values and others as well can also be done using a JOIN.

-- Output multiple values which is a list of customers with lastnames that start with 'I'
USE [AdventureWorks]
GO
SELECT o.[SalesOrderID], o.[OrderDate], o.[ShipDate], o.[CustomerID]
FROM Sales.SalesOrderHeader o
INNER JOIN Sales.Customer c on o.CustomerID = c.CustomerID
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.LastName LIKE N'I%' AND p.PersonType = 'SC'

In fact, the output will be the same. But which one performs better?

Before we get into that, let me tell you that I have dedicated a section to this hot topic. We’ll examine it with complete execution plans and have a look at illustrations.

So, bear with me for a moment. Let’s discuss another way to place your subqueries.

#sql server #sql query #sql server #sql subqueries #t-sql statements #sql

Introduction to Recursive CTE

This article will introduce the concept of SQL recursive. Recursive CTE is a really cool. We will see that it can often simplify our code, and avoid a cascade of SQL queries!

Why use a recursive CTE ?

The recursive queries are used to query hierarchical data. It avoids a cascade of SQL queries, you can only do one query to retrieve the hierarchical data.

What is recursive CTE ?

First, what is a CTE? A CTE (Common Table Expression) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. For example, you can use CTE when, in a query, you will use the same subquery more than once.

A recursive CTE is one having a subquery that refers to its own name!

Recursive CTE is defined in the SQL standard.

How to make a recursive CTE?

A recursive CTE has this structure:

  • The WITH clause must begin with “WITH RECURSIVE”
  • The recursive CTE subquery has two parts, separated by “UNION [ALL]” or “UNION DISTINCT”:
  • The first part produces the initial row(s) for the CTE. This SELECT does not refer to the CTE name.
  • The second part recurses by referring to the CTE name in its FROM clause.

Practice / Example

In this example, we use hierarchical data. Each row can have zero or one parent. And it parent can also have a parent etc.

Create table test (id integer, parent_id integer);

insert into test (id, parent_id) values (1, null);

insert into test (id, parent_id) values (11, 1);
insert into test (id, parent_id) values (111, 11);

insert into test (id, parent_id) values (112, 11);

insert into test (id, parent_id) values (12, 1);

insert into test (id, parent_id) values (121, 12);

For example, the row with id 111 has as ancestors: 11 and 1.

Before knowing the recursive CTE, I was doing several queries to get all the ancestors of a row.

For example, to retrieve all the ancestors of the row with id 111.

While (has parent)

	Select id, parent_id from test where id = X

With recursive CTE, we can retrieve all ancestors of a row with only one SQL query :)

WITH RECURSIVE cte_test AS (
	SELECT id, parent_id FROM test WHERE id = 111
	UNION 
	SELECT test.id, test.parent_id FROM test JOIN cte_test ON cte_test.id = test.parent_id

) SELECT * FROM cte_test

Explanations:

  • “WITH RECURSIVE”:

It indicates we will make recursive

  • “SELECT id, parent_id FROM test WHERE id = 111”:

It is the initial query.

  • “UNION … JOIN cte_test” :

It is the recursive expression! We make a jointure with the current CTE!

Replay this example here

#sql #database #sql-server #sql-injection #writing-sql-queries #sql-beginner-tips #better-sql-querying-tips #sql-top-story

Cayla  Erdman

Cayla Erdman

1596441660

Welcome Back the T-SQL Debugger with SQL Complete – SQL Debugger

When you develop large chunks of T-SQL code with the help of the SQL Server Management Studio tool, it is essential to test the “Live” behavior of your code by making sure that each small piece of code works fine and being able to allocate any error message that may cause a failure within that code.

The easiest way to perform that would be to use the T-SQL debugger feature, which used to be built-in over the SQL Server Management Studio tool. But since the T-SQL debugger feature was removed completely from SQL Server Management Studio 18 and later editions, we need a replacement for that feature. This is because we cannot keep using the old versions of SSMS just to support the T-SQL Debugger feature without “enjoying” the new features and bug fixes that are released in the new SSMS versions.

If you plan to wait for SSMS to bring back the T-SQL Debugger feature, vote in the Put Debugger back into SSMS 18 to ask Microsoft to reintroduce it.

As for me, I searched for an alternative tool for a T-SQL Debugger SSMS built-in feature and found that Devart company rolled out a new T-SQL Debugger feature to version 6.4 of SQL – Complete tool. SQL Complete is an add-in for Visual Studio and SSMS that offers scripts autocompletion capabilities, which help develop and debug your SQL database project.

The SQL Debugger feature of SQL Complete allows you to check the execution of your scripts, procedures, functions, and triggers step by step by adding breakpoints to the lines where you plan to start, suspend, evaluate, step through, and then to continue the execution of your script.

You can download SQL Complete from the dbForge Download page and install it on your machine using a straight-forward installation wizard. The wizard will ask you to specify the installation path for the SQL Complete tool and the versions of SSMS and Visual Studio that you plan to install the SQL Complete on, as an add-in, from the versions that are installed on your machine, as shown below:

Once SQL Complete is fully installed on your machine, the dbForge SQL Complete installation wizard will notify you of whether the installation was completed successfully or the wizard faced any specific issue that you can troubleshoot and fix easily. If there are no issues, the wizard will provide you with an option to open the SSMS tool and start using the SQL Complete tool, as displayed below:

When you open SSMS, you will see a new “Debug” tools menu, under which you can navigate the SQL Debugger feature options. Besides, you will see a list of icons that will be used to control the debug mode of the T-SQL query at the leftmost side of the SSMS tool. If you cannot see the list, you can go to View -> Toolbars -> Debugger to make these icons visible.

During the debugging session, the SQL Debugger icons will be as follows:

The functionality of these icons within the SQL Debugger can be summarized as:

  • Adding Breakpoints to control the execution pause of the T-SQL script at a specific statement allows you to check the debugging information of the T-SQL statements such as the values for the parameters and the variables.
  • Step Into is “navigate” through the script statements one by one, allowing you to check how each statement behaves.
  • Step Over is “execute” a specific stored procedure if you are sure that it contains no error.
  • Step Out is “return” from the stored procedure, function, or trigger to the main debugging window.
  • Continue executing the script until reaching the next breakpoint.
  • Stop Debugging is “terminate” the debugging session.
  • Restart “stop and start” the current debugging session.

#sql server #sql #sql debugger #sql server #sql server stored procedure #ssms #t-sql queries