Just like most kids these days do not seem to know where their meat comes from (hint: not growing on trees but could be from a lab) or why the sky is blue (hint: blue light has shorter, smaller waves), an alarming amount of developers do not (really) know where their data comes from and how it is processed & stored. This article should be a starting point into an engineer’s journey in understanding how that works. That is a very good thing regardless of what framework and no-code platform marketers are trying to teach you. Especially if data really is the new oil (apparently it has been for some years now), it might pay off to see how databases work.

Disclaimer: this is not a SQL tutorial

Without further adieu a simple SQL statement to fetch all users from a DB called users:

Disclaimer 2: the point is not to write fancy SQL statements but to showcase what is happening once you do

SELECT * FROM users;

So, how many of you actually know what is happening between pressing enter in your psql terminal and getting back that result?

Fundamentals

In order to understand this one has to start at the beginning — PostgresDB’s architecture.

At the end of the day it is a client/server application written in C++. A client is anyone who accesses the DB e.g. your psql terminal, your Lambda function, JDBC driver etc. and the server is the PGDB backend, which accepts your requests, stores the data etc.

Here is a basic architecture overview of PostgresDB that will be zoomed into throughout the article:

Image for post

This is kind of oldschool but it still works :) Source: SlideShare

The Path of a Query

There is a pretty good overview of the “path of a query” in the depths of the official PostgresDB documentation. Here is what you should know:

1. Establish connection, transmit query & wait for results

PGDB can handle multiple connections concurrently (through the “postmaster”, see architecture) and for each connection it forks a new process (“postgres”, see architecture) to handle the requests (e.g. SQL statements) of that connection. In other words it is a simple “process per user” client/server model. The postmaster handles the initial connection and authentication and then hands that connection over to a new postgres process. Those processes communicate between each other with a shared memory and semaphores to ensure the overall data integrity even with concurrent connections (remember ACID?)

#sql #data #data-science #relational-databases #postgres #data analysis

What’s behind a simple SQL query?
1.10 GEEK