Azure Cosmos DB for the SQL Professional

I had a previous blog comparing Azure Cosmos DB to a relational database and one topic that it did not address that I want to now is how to handle reference tables that are common in the relational database world.

A big difference with Azure Cosmos DB, a NoSQL database, compared to a relational database is you will create a denormalized data model. Take a person record for example. You will embed all the information related to a person, such as their contact details and addresses, into a single JSON document. Retrieving a complete person record from the database is now a single read operation against a single container and for a single item. Updating a person record, with their contact details and addresses, is also a single write operation against a single item. By denormalizing data, your application typically will have better read performance and write performance and allow for a scale-out architecture since you don’t need to join tables.

(Side note: “container” is the generic term. Depending on the API, a specific term is used such as “collection” for the Core (SQL) API). Think of a container as one or more tables in the relational world. Going a little deeper, think of a container as a group of one or more “entities” which share the same partition key. A relational table shares a schema, but containers are not bound in that way.)

Embedding data works nicely for many cases but there are scenarios when denormalizing your data will cause more problems than it is worth. In a document database, you can have information in one document that relates to data in other documents. While there may be some use cases that are better suited for a relational database than in Azure Cosmos DB (see below), in most cases you can handle relationships in Azure Cosmos DB by creating a normalized data model for them, with the trade-off that it can require more round trips to the server to read data (but improve the efficiency of write operations since less data is written). In general, use normalized data models to represent one-to-many relationships or many-to-many relationships when related data changes frequently. The key is knowing whether the cost of the updates is greater than the cost of the queries.

When using a normalized data model, your application will need to handle creating the reference document. One way would be to use a change feed that triggers on the creation of a new document – the change feed essentially triggers an Azure function that creates the relationship record.

When using a normalized data model, your application will need to query the multiple documents that need to be joined (costing more money because it will use more request units), and do the joining within the application (i.e. join a main document with documents that contain the reference data) as you cannot do a “join” between documents within different containers in Azure Cosmos DB (joins between documents within the same container can be done via self-joins). Since every time you display a document it needs to search the entire container for the name, it would be best to put the other document type (the reference data) in a different container so you can have different partition keys for each document type (read up on how partitioning can make a big impact on performance and cost).

Note that “partitioning” in a relational database management system (RDBMS) compared to Azure Cosmos DB are different things: partitioning in Azure Cosmos DB refers to “sharding” or “horizontal partitioning“, where replica sets which contain both the data and copies of compute (database) resources operating in a “shared nothing” architecture (i.e. scaled “horizontally” where each compute resource (server node) operates independently of every other node, but with a programming model transparent to developers). Conversely, what is often referred to as “partitioning” in a RDBMS is purely a separation of data into separate file groups within a shared compute (database) environment. This is also often called “vertical partitioning”.

#do not use #sql

devblogs.microsoft.com

Azure Cosmos DB for the SQL Professional - Referencing Tables