SQL Indexes Explained, Pt. 1

Properly used, an SQL database index can be so effective that it might seem like magic. But the following series of exercises will show that underneath, the logic of most SQL indexes—and wielding them correctly—is quite straightforward.

In this series, SQL Indexes Explained, we will walk through the motivations for using indexes to access data and for designing indexes in the way it is done by all modern RDBMSes. We will then look at the algorithms used to return data for specific query patterns.

You don’t have to know much about indexes to be able to follow SQL Indexes Explained. There are just two preconditions:

Basic SQL knowledge
Basic knowledge of any programming language

The main topics SQL Indexes Explained will get into are:

Why we need SQL database indexes; visualizing execution plans using indexes
Index design: which indexes make a query fast and efficient
How we can write a query to effectively use indexes
The impact of the use of indexes in SQL on read/write efficiency
Covering indexes
Partitioning, its impact on reading and writing, and when to use it

This isn’t just an SQL index tutorial—it’s a deep dive into understanding the underlying mechanics of indexes.

We are going to figure out how an RDBMS uses indexes by doing exercises and analyzing our problem-solving methods. Our exercise material consists of read-only Google Sheets. To do an exercise, you can copy the Google Sheet (File → Make a copy) or copy its contents into your own Google Sheet.

In every exercise, we’ll show an SQL query that uses Oracle syntax. For dates, we will use the ISO 8601 format, YYYY-MM-DD.

Exercise 1: All of a Client’s Reservations

The first task—don’t do it just yet—is to find all rows from the Reservation spreadsheet for a specific client of a hotel reservation system, and copy them into your own spreadsheet, simulating the execution of the following query:

SELECT
        *
    FROM
        Reservations
    WHERE
        ClientID = 12;

But we want to follow a particular method.

Approach 1: No Sorting, No Filtering

For the first try, do not use any sorting or filtering features. Please, record the time spent. The resulting sheet should contain 73 rows.

This pseudocode illustrates the algorithm for accomplishing the task without sorting:

For each row from Reservations
  If Reservations.ClientID = 12 then fetch Reservations.*

In this case, we had to check all 841 rows to return and copy 73 rows satisfying the condition.

Approach 2: Sorting Only

For the second try, sort the sheet according to the value of the ClientID column. Do not use filters. Record the time and compare it with the time it took to complete the task without sorting data.

After sorting, the approach looks like this:

For each row from Reservations
  If ClientID = 12 then fetch Reservations.*
  Else if ClientID > 12 exit

This time, we had to check “only” 780 rows. If we could somehow jump to the first row, it would take even less time.

But if we would have to develop a program for the task, this solution would be even slower than the first one. That’s because we would have to sort all the data first, which means each row would have to be accessed at least once. This approach is good only if the sheet is already sorted in the desired order.

#sql

Exercise 1: All of a Client’s Reservations

Approach 1: No Sorting, No Filtering

Approach 2: Sorting Only

toptal.com

SQL Indexes Explained, Pt. 1