zBoosting Performance for Small Distributed SQL Data Sets with Colocated Tables

An Introduction to Colocated Tables
In YugabyteDB v2.1, we released a new feature in beta: colocated tables. In this post, we’ll explain what colocated tables are in a distributed SQL database, why you would need them, and how to get started.

Relational databases often have a large number of tables and indexes. A lot of these tables are closely related and commonly queried together via joins or subqueries. In YugabyteDB, a scale-out distributed SQL database, each table will be split automatically into a number of shards. Sharding is the process of breaking up large tables into smaller chunks called shards that are spread across multiple servers. A shard (also referred to as a tablet) is a horizontal data partition that contains a subset of the total data set, and is responsible for serving a portion of the overall workload. The idea is to distribute data that can’t fit on a single node onto a cluster of database nodes.

So, let’s say you have 1000 tables, each with 1 index. If each table and index is split in 10 tablets, then there are 20000 tablets in the cluster. Each tablet uses its own RocksDB instance to store data. This is done so that the tablets can be uniformly distributed across multiple disks on a node. This example would then result in 20K RocksDB instances, increasing CPU, disk, and network overhead.

Having multiple tablets for small tables that are closely related can be detrimental to performance in a distributed SQL database. This is because complex queries involving joins and subqueries on these tables will result in a large fan-out, requiring lookups across multiple nodes or regions, and incurring network latency. Colocated tables allow you to store (or co-locate) such datasets on a single tablet, called the colocation tablet, thereby eliminating query fan-out and boosting performance. Note that the data in colocated tables is still replicated across multiple nodes, providing high availability, while also making reads much faster.

Starting with YugabyteDB v2.1, you can create a database that is colocated. This will cause all tables in that database to be stored on the same tablet. Large tables or tables with higher write throughput can be opted out of colocation. This will end up creating separate tablets for such tables. You can find more information in the colocated tables documentation.

yugabytedb colocated tables feature for boosting performance small data sets, data for all tables colocated database stored on same tablet

As shown in the above diagram, creating a colocated database will create a single tablet for that database which is replicated across multiple nodes. Data for all tables in that database will be stored on the same tablet. Tables that opt out of colocation will continue to be split into multiple shards.

yugabytedb distributed sql database colocated tables feature for boosting performance small data sets

Above is an example of a database using colocated tables. The “products”, “categories”, “stores”, and “warehouses” tables are colocated on a single tablet while the “orders” and “line_items” tables have their own set of tablets.

#databases #distributed sql #how it works #open source #postgresql #yugabytedb 2.1

zBoosting Performance for Small Distributed SQL Data Sets with Colocated Tables - SQL

The system catalog of YSQL (and PostgreSQL) consists of over 200 tables and relations to maintain metadata information about the database, such as the tables created in the database, columns . An Introduction to Colocated Tables In YugabyteDB v2.1, we released a new feature in beta: colocated tables. In this post, we’ll explain what colocated tables are in a distributed SQL database, why you would need them, and how to get started. Relational databases often have a large number of tables and indexes. A lot of these tables are closely related and commonly queried together via joins or subqueries. In YugabyteDB, a scale-out distributed SQL database, each table will be split automatically into a number of shards. Sharding is the process of breaking up large tables into smaller chunks called shards that are spread across multiple servers. A shard (also referred to as a tablet) is a horizontal data partition that contains a subset of the total data set, and is responsible for serving a portion of the overall workload. The idea is to distribute data that can’t fit on a single node onto a cluster of database nodes. So, let’s say you have 1000 tables, each with 1 index. If each table and index is split in 10 tablets, then there are 20000 tablets in the cluster. Each tablet uses its own RocksDB instance to store data. This is done so that the tablets can be uniformly distributed across multiple disks on a node. This example would then result in 20K RocksDB instances, increasing CPU, disk, and network overhead. Having multiple tablets for small tables that are closely related can be detrimental to performance in a distributed SQL database. This is because complex queries involving joins and subqueries on these tables will result in a large fan-out, requiring lookups across multiple nodes or regions, and incurring network latency. Colocated tables allow you to store (or co-locate) such datasets on a single tablet, called the colocation tablet, thereby eliminating query fan-out and boosting performance. Note that the data in colocated tables is still replicated across multiple nodes, providing high availability, while also making reads much faster. Starting with YugabyteDB v2.1, you can create a database that is colocated. This will cause all tables in that database to be stored on the same tablet. Large tables or tables with higher write throughput can be opted out of colocation. This will end up creating separate tablets for such tables. You can find more information in the colocated tables documentation. yugabytedb colocated tables feature for boosting performance small data sets, data for all tables colocated database stored on same tablet As shown in the above diagram, creating a colocated database will create a single tablet for that database which is replicated across multiple nodes. Data for all tables in that database will be stored on the same tablet. Tables that opt out of colocation will continue to be split into multiple shards. yugabytedb distributed sql database colocated tables feature for boosting performance small data sets Above is an example of a database using colocated tables. The “products”, “categories”, “stores”, and “warehouses” tables are colocated on a single tablet while the “orders” and “line_items” tables have their own set of tablets.

blog.yugabyte.com

zBoosting Performance for Small Distributed SQL Data Sets with Colocated Tables - SQL