Merging and returning arrays

I've two functions,

I've two functions,

CREATE OR REPLACE FUNCTION function_a(input varchar)
RETURNS setof integer AS $$
BEGIN
  RETURN QUERY
    SELECT somecolumn FROM some_things WHERE a_column = input;
END;
$$ LANGUAGE PLpgSQL;

CREATE OR REPLACE FUNCTION function_b(inputs varchar[])
RETURNS setof integer AS $$
DECLARE
input varchar;
result integer[];
BEGIN
FOREACH input IN ARRAY inputs LOOP
result := result || ARRAY[function_a(input)];
END LOOP;
END;
$$ LANGUAGE PLpgSQL;

I am running it like,

 SELECT function_b(ARRAY['a', 'b']);

The error,

ERROR:  query "SELECT result || ARRAY[function_a(input)]" returned more than one row
CONTEXT: PL/pgSQL function function_b(character varying[]) line 7 at assignment

All I want to do is to run a function over an array. I've always used scripting languages like Ruby to do this kind of stuff instead of using SQL, but I'm trying to learn SQL as it is much faster to get results on the db console itself. I wish it wasn't so frustrating.

PostgreSQL with Nodejs

Most of the applications need at some point to persist data. It can be through files, local storage, cloud services or often databases

If you do not have particular requirement relational database systems are usually a good default choice, and particularly PostgreSQL, a very powerful open source SQL server. Moreover, if you develop for the Nodejs platform you will not run short of tools and solutions.

So what would be the ideal library or tool for your next application?

If you are lucky enough to work for a company which has resources to have its own dedicated database team, you will probably have only to use the abstractions they provide in your language of choice (Javascript in our case) to call the stored procedures they will have carefully crafted. In that case, you will eventually see no SQL (Structured Query Language, the language used in many relational database systems) at all and won’t go out of your comfort zone. In practice, this case may not occur very often and the truth is that many Nodejs developers or even so-called full stack developers know very little about SQL and will seek for various abstractions in their favorite packages repository.

Lately, I came across a very interesting article written by Thomas Hunter which were describing the different levels of abstraction a database related library may have. The idea was to dissuade you to use frameworks called ORM (Object Relational Mapping) and to convince you to learn SQL. Of course, as often in software development, each kind of abstraction has its pros and cons and let’s review quickly what I learned in Thomas’ article.

If you want more details you should probably read Thomas’ article and skip the sections below.

Drivers

Drivers provide almost no abstraction at all. They usually only handle connections and the communication protocol with the database. They may also parse the responses into adapted data structures (most of the time in Javascript it will be JSON objects and arrays). The *de facto *PostgreSQL driver for Nodejs is pg.

Drivers are normally very fast and the most flexible solution as they will let you manually write SQL queries yourself. On the other hand, it might quickly become tedious and dangerous to write dynamic queries by hand as it would involve a lot of string manipulation.

Query builders

With query builders, instead of manipulating strings, you will be using functions whose API is usually close to the SQL grammar. In Javascript, functions compose very well with patterns such as partial application, decoration, composition, recursion, higher-order function, etc.

They will often let you create your own abstractions and solve many problems in an elegant way. They also provide a framework with more explicit constraints than strings can do (especially if you use some sort of types on top of the normal signatures) which usually makes you avoid making mistakes and fit better in your everyday tools (IDE and so).

A very popular query builders library is knex.

ORM

ORM is the highest level of abstraction and such frameworks come with usually a lot of features. The promise is to create a data model domain with idioms you know such classes, objects, etc; and then the framework will translate it into a database structure and convenient ways to query it through the models you will have defined. The obvious advantage is that you can handle everything in your language of choice and with concepts familiar to you, the underlying database entities and queries will stay very far from your application code. However, the way we think the application data model sometimes does not quite fit the way we would build a database for that model: it is known as the object-relational impedance mismatch. Such libraries have been creating passionate debates between their fans and the others. Of course, they definitely have their usage and the problem is not black or white, but there are recurrent criticisms:

  • When you learn an ORM (and this is already many hours of work) you learn syntax and concepts you probably will not be used outside of this particular ORM.
  • The SQL queries generated by an ORM may not be very efficient in the sense that ORM will have to tackle specialized queries in a more generalist way. Moreover, performances can also degrade as the ORM will often not return raw data but some model instances of it (in order to provide some features like validation, lazy loading, etc). You often trade performances for features.
  • ORM cannot generate some (basic sometimes) queries or have it wrong when used through their models. This problem is however easily worked around as they provide a low-level query interface to write the queries you want directly with the underlying driver.
  • You may not design the database, the database schema can be generated for you which may sometimes lead to very poor database design.

The most popular ORM for Nodejs is Sequelize.

I think we can do better…

Introducing Ship-Hold

Note all the examples will use the database generated by the script in annexes at the end of this article
In the 1970s (yes almost 50 years ago!). Edgar F. Codd created a Relational Algebra. SQL became the most common implementation of the algebra and has evolved a lot to become one of the most powerful query language and one of the most battle-tested. It includes now procedure capabilities and extensions (especially with PostgreSQL) which go beyond the scope of the algebra. Clearly, SQL is often neglected by application developers whereas its capabilities are enormous.

Through products (joins) you can cross data from different tables and organize your result data in infinite ways whereas with SQL functions you can aggregate data to format it the way you want. A single query although quite verbose is probably way more flexible than what you could do with Javascript even by composing small functions together.

You want to fetch all the posts including a field *author *with the first name and the last name of the author:

SELECT 
    “posts”.*, 
    concat_ws(‘ ‘,”users”.”first_name”,”users”.”last_name”) as “author” 
FROM “posts” JOIN “users” USING(“user_id”);

Actually, you prefer to include all author’s data as a JSON object:

SELECT 
    “posts”.*, 
    to_json(“users”.*) as “author” 
FROM “posts” JOIN “users” USING(“user_id”);

Or you want to know who are the five users who have published the most comments:

WITH 
    “commenters” AS (SELECT COUNT(*),”user_id” FROM “comments” GROUP BY “user_id” ORDER BY “count” DESC LIMIT 5)
SELECT “users”.* FROM “commenters” JOIN “users” USING (“user_id”);

Doing the same things in Javascript considering your table as arrays would require a great deal of code.

You are convinced and decide to learn SQL: great! What if you could write in Javascript with function calls, SQL queries the way you would write them in pure SQL? Good news, with ship-hold you can.

Let’s say you want to find all the posts about SQL published after 2010 including their author’s data ordered by publication date:

const posts = await sh
    .select('posts.*',{
        value: toJson('"users".*'),
        as:'author'
    })
    .from('posts')
    .where('title','ilike','%sql%')
    .and('published_at','>',new Date(2010,1,1))
    .join('users')
    .on('posts.user_id','"users"."user_id"')
    .orderBy('published_at','desc')
    .run();

And ship hold query builders are regular functions so you can add all the syntactic sugar you want. Even better you can compose builders together as you would do with SQL subqueries. For example, you could get the same result for the third example with the following code:

// A higher order function
const fromTable = table => (...args) => sh.select(...args).from(table);

const fromPosts = fromTable('posts');

const fromUsers = fromTable('users');
// A sub query
const getTopFiveCommenters = {
    value: fromPosts(count('*'), 'user_id')
        .groupBy('user_id')
        .orderBy('count', 'desc')
        .limit(5),
    as: 'talkative'
};

const users = await fromUsers()
    .where('user_id', 'IN', sh.select('user_id').from(getTopFiveCommenters))
    .run();

It basically has the power of SQL with the expressiveness of Javascript functions!

Eager loading: a not so easy problem.

SQL is very powerful but at some point, it becomes quite complicated for a human being to write a query to get the expected result. It is particularly true when you want to fetch related data from different tables and start to apply pagination to your result set or to a sub-result set. That’s one of the reasons ORM exist: they eventually help you to generate such queries.

Let’s say you want to get users with the posts they have written in case they have written some:

SELECT * FROM “users” LEFT JOIN “posts” USING (“user_id”)

Ok, now you want the users whose last name is Renard, and you want only the five first Renard ordered by alphanumeric order on their first name. Of course you still want the articles they eventually have written too.

SELECT * FROM “users” LEFT JOIN “posts” USING (“user_id”) WHERE “last_name” = ‘Renard’ ORDER BY “first_name” LIMIT 5

Is it correct ? Actually no: The limit will be applied to the join and as various rows may correspond to the same user (one for each written article) with this query you’ll be missing data.

SELECT * FROM (SELECT * FROM “users” WHERE “last_name”=’Renard’ ORDER BY “first_name” LIMIT 5) as “users” LEFT JOIN “posts” USING (“user_id”) ORDER BY “first_name”

This query would do the job, although the data parsed by the driver would then need to be aggregated: again various rows may correspond to the same user!

Now you also want to paginate the posts included: instead of getting all the articles written by a given user you want the three last published about Javascript. Ah, and you also want to include within these articles the five last published comment with their author’s data…

Ok, you get the point, it becomes quite complicated to write a query which will get you the right result.

Thankfully ORM and such will help you to get the right result. How they do may vary from one library to another. Some will make multiple queries then buffer the result and aggregate it programmatically. Others will inline the join queries, etc. A different strategy can get you the result but more or less efficiently.

Ship hold is not an ORM but it gives you the ability to define services related to a particular table. By using a service, it will simplify the syntax to make a query against a particular table, but it also gives you the ability to define relations between services. It will greatly help you then to write queries with inclusions. You basically only have to compose builders as you would construct your sentence in English.

const Users = sh.service({
    table: 'users',
    primaryKey: 'user_id'
});

const Posts = sh.service({
    table: 'posts',
    primaryKey: 'post_id'
});

const Comments = sh.service({
    table: 'comments',
    primaryKey: 'comment_id'
});
Users.hasMany(Posts);
Posts.belongsTo(Users, 'user_id', 'author');

Users.hasMany(Comments);
Posts.hasMany(Comments);
Comments.belongsTo(Users, 'user_id', 'author');
Comments.belongsTo(Posts, 'post_id', 'article');

Then get the result for our very complicated request aforementioned:

// Five last comments with their authors' data
const fiveLastPublishedComments = Comments
    .select()
    .orderBy('published_at', 'desc')
    .limit(5)
    .include(Users);
// Three last Javascript posts
const threeLastPostAboutJavascript = Posts
    .select()
    .where('title', 'ilike', '%javascript%')
    .orderBy('published_at', 'desc')
    .limit(3)
    .include(fiveLastPublishedComments);

// finally
const renards = await Users
    .select()
    .where('last_name', 'Renard')
    .orderBy('first_name')
    .limit(5)
    .include(threeLastPostAboutJavascript)
    .run();

And done!

I dare you to do it so easily with an other tool :)

About performances

You might wonder how good the queries generated will perform. It is very difficult to compare data access libraries together as the result might change quite a lot depending on the database itself: the indexes created, the amount of data, etc.

However, ship-hold tries to leverage as much as possible the database features where a lot of ORM will duplicate the effort in their Javascript code base. A good example is the aggregation. Where many ORM will perform several queries buffer the result (often with redundant data) then aggregate the result programmatically, ship-hold will ask the database to do the aggregation with its built-in mechanisms. It is also one of the reasons ship-hold is small compared to other libraries. The weight of the installed bundled is 771kbincluding pg driver and others dependencies.

Database makers strive to improve performances, and write usually faster code in C++, with better algorithms that I personally can do on the JS side.

Any performance improvement on the database engine would, therefore, improve ship-hold. Moreover, join queries and subqueries carry a part of redundant data. If you do the aggregation at the database level, I suspect that less data would need to go throw the wires which might improve the performances compared to a solution where data is buffered on the JS side and cleaned/aggregated. It could be especially true if your database is on a remote server.

Nevertheless, you can have a look at the numbers of a sample applicationimplemented with various popular libraries.

If you think ship-hold worth it, please share the project link and star the repository or clap this article. If you have any recommendations to improve it don’t hesitate to contribute. Thanks for reading.

Annexes

Database script

DROP TABLE IF EXISTS posts, users, comments, tags, posts_tags CASCADE;
CREATE TABLE users (
    user_id serial PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    biography TEXT,
    first_name VARCHAR(100) NOT NULL, 
    last_name VARCHAR(100) NOT NULL 
);

CREATE TABLE posts(
    post_id serial PRIMARY KEY,
    title VARCHAR NOT NULL,
    content TEXT,
    published_at TIMESTAMP,
    user_id integer REFERENCES users (user_id)
);

CREATE TABLE comments(
    comment_id serial PRIMARY KEY,
    content TEXT,
    published_at TIMESTAMP,
    user_id integer REFERENCES users (user_id),
    post_id integer REFERENCES posts (post_id)
);

CREATE TABLE tags(
    tag VARCHAR PRIMARY KEY,
    description TEXT
);

CREATE TABLE posts_tags(
    post_id INTEGER REFERENCES posts,
    tag VARCHAR REFERENCES tags,
    PRIMARY KEY (post_id, tag)
);

By : Laurent Renard

An Introduction to Queries in PostgreSQL

An Introduction to Queries in PostgreSQL

In this guide, we will discuss the basic syntax of SQL queries as well as some of the more commonly-employed functions and operators. We will also practice making SQL queries using some sample data in a PostgreSQL database.

Introduction

Databases are a key component of many websites and applications, and are at the core of how data is stored and exchanged across the internet. One of the most important aspects of database management is the practice of retrieving data from a database, whether it’s on an ad hoc basis or part of a process that’s been coded into an application. There are several ways to retrieve information from a database, but one of the most commonly-used methods is performed through submitting queries through the command line.

In relational database management systems, a query is any command used to retrieve data from a table. In Structured Query Language (SQL), queries are almost always made using the SELECT statement.

PostgreSQL, often shortened to “Postgres,” is a relational database management system with an object-oriented approach, meaning that information can be represented as objects or classes in PostgreSQL schemas. PostgreSQL aligns closely with standard SQL, although it also includes some features not found in other relational database systems.


Prerequisites

In general, the commands and concepts presented in this guide can be used on any Linux-based operating system running any SQL database software. However, it was written specifically with an Ubuntu 18.04 server running PostgreSQL in mind. To set this up, you will need the following:

With this setup in place, we can begin the tutorial.


Creating a Sample Database

Before we can begin making queries in SQL, we will first create a database and a couple tables, then populate these tables with some sample data. This will allow you to gain some hands-on experience when you begin making queries later on.

For the sample database we’ll use throughout this guide, imagine the following scenario:

You and several of your friends all celebrate your birthdays with one another. On each occasion, the members of the group head to the local bowling alley, participate in a friendly tournament, and then everyone heads to your place where you prepare the birthday-person’s favorite meal.

Now that this tradition has been going on for a while, you’ve decided to begin tracking the records from these tournaments. Also, to make planning dinners easier, you decide to create a record of your friends’ birthdays and their favorite entrees, sides, and desserts. Rather than keep this information in a physical ledger, you decide to exercise your database skills by recording it in a PostgreSQL database.

To begin, open up a PostgreSQL prompt as your postgres superuser:

sudo -u postgres psql

Note: If you followed all the steps of the prerequisite tutorial on Installing PostgreSQL on Ubuntu 18.04, you may have configured a new role for your PostgreSQL installation. In this case, you can connect to the Postgres prompt with the following command, substituting sammy with your own username:

sudo -u sammy psql

Next, create the database by running:

CREATE DATABASE birthdays;

Then select this database by typing:

\c birthdays

Next, create two tables within this database. We’ll use the first table to track your friends’ records at the bowling alley. The following command will create a table called tourneys with columns for the name of each of your friends, the number of tournaments they’ve won (wins), their all-time best score, and what size bowling shoe they wear (size):

CREATE TABLE tourneys (
name varchar(30),
wins real,
best real,
size real
);

Once you run the CREATE TABLE command and populate it with column headings, you’ll receive the following output:

OutputCREATE TABLE

Populate the tourneys table with some sample data:

INSERT INTO tourneys (name, wins, best, size)
VALUES ('Dolly', '7', '245', '8.5'),
('Etta', '4', '283', '9'),
('Irma', '9', '266', '7'),
('Barbara', '2', '197', '7.5'),
('Gladys', '13', '273', '8');

You’ll receive the following output:

OutputINSERT 0 5

Following this, create another table within the same database which we’ll use to store information about your friends’ favorite birthday meals. The following command creates a table named dinners with columns for the name of each of your friends, their birthdate, their favorite entree, their preferred side dish, and their favorite dessert:

CREATE TABLE dinners (
name varchar(30),
birthdate date,
entree varchar(30),
side varchar(30),
dessert varchar(30)
);

Similarly for this table, you’ll receive feedback verifying that the table was created:

OutputCREATE TABLE

Populate this table with some sample data as well:

INSERT INTO dinners (name, birthdate, entree, side, dessert)
VALUES ('Dolly', '1946-01-19', 'steak', 'salad', 'cake'),
('Etta', '1938-01-25', 'chicken', 'fries', 'ice cream'),
('Irma', '1941-02-18', 'tofu', 'fries', 'cake'),
('Barbara', '1948-12-25', 'tofu', 'salad', 'ice cream'),
('Gladys', '1944-05-28', 'steak', 'fries', 'ice cream');

OutputINSERT 0 5

Once that command completes successfully, you’re done setting up your database. Next, we’ll go over the basic command structure of SELECT queries.


Understanding SELECT Statements

As mentioned in the introduction, SQL queries almost always begin with the SELECT statement. SELECT is used in queries to specify which columns from a table should be returned in the result set. Queries also almost always include FROM, which is used to specify which table the statement will query.

Generally, SQL queries follow this syntax:

SELECT column_to_select FROM table_to_select WHERE certain_conditions_apply;

By way of example, the following statement will return the entire name column from the dinners table:

SELECT name FROM dinners;

Output name

Dolly
Etta
Irma
Barbara
Gladys
(5 rows)

You can select multiple columns from the same table by separating their names with a comma, like this:

SELECT name, birthdate FROM dinners;

Output name | birthdate
---------+------------
Dolly | 1946-01-19
Etta | 1938-01-25
Irma | 1941-02-18
Barbara | 1948-12-25
Gladys | 1944-05-28
(5 rows)

Instead of naming a specific column or set of columns, you can follow the SELECT operator with an asterisk (*) which serves as a placeholder representing all the columns in a table. The following command returns every column from the tourneys table:

SELECT * FROM tourneys;

Output name | wins | best | size
---------+------+------+------
Dolly | 7 | 245 | 8.5
Etta | 4 | 283 | 9
Irma | 9 | 266 | 7
Barbara | 2 | 197 | 7.5
Gladys | 13 | 273 | 8
(5 rows)

WHERE is used in queries to filter records that meet a specified condition, and any rows that do not meet that condition are eliminated from the result. A WHERE clause typically follows this syntax:

. . . WHERE column_name comparison_operator value

The comparison operator in a WHERE clause defines how the specified column should be compared against the value. Here are some common SQL comparison operators:

Operator What it does = tests for equality != tests for inequality < tests for less-than > tests for greater-than <= tests for less-than or equal-to >= tests for greater-than or equal-to BETWEEN tests whether a value lies within a given range IN tests whether a row’s value is contained in a set of specified values EXISTS tests whether rows exist, given the specified conditions LIKE tests whether a value matches a specified string IS NULL tests for NULL values IS NOT NULL tests for all values other than NULL For example, if you wanted to find Irma’s shoe size, you could use the following query:

SELECT size FROM tourneys WHERE name = 'Irma';

Output size
7

(1 row)

SQL allows the use of wildcard characters, and these are especially handy when used in WHERE clauses. Percentage signs (%) represent zero or more unknown characters, and underscores (_) represent a single unknown character. These are useful if you’re trying to find a specific entry in a table, but aren’t sure of what that entry is exactly. To illustrate, let’s say that you’ve forgotten the favorite entree of a few of your friends, but you’re certain this particular entree starts with a “t.” You could find its name by running the following query:

SELECT entree FROM dinners WHERE entree LIKE 't%';

Output entree

tofu
tofu
(2 rows)

Based on the output above, we see that the entree we have forgotten is tofu.

There may be times when you’re working with databases that have columns or tables with relatively long or difficult-to-read names. In these cases, you can make these names more readable by creating an alias with the AS keyword. Aliases created with AS are temporary, and only exist for the duration of the query for which they’re created:

SELECT name AS n, birthdate AS b, dessert AS d FROM dinners;

Output n | b | d
---------+------------+-----------
Dolly | 1946-01-19 | cake
Etta | 1938-01-25 | ice cream
Irma | 1941-02-18 | cake
Barbara | 1948-12-25 | ice cream
Gladys | 1944-05-28 | ice cream
(5 rows)

Here, we have told SQL to display the name column as n, the birthdate column as b, and the dessert column as d.

The examples we’ve gone through up to this point include some of the more frequently-used keywords and clauses in SQL queries. These are useful for basic queries, but they aren’t helpful if you’re trying to perform a calculation or derive a scalar value (a single value, as opposed to a set of multiple different values) based on your data. This is where aggregate functions come into play.


Aggregate Functions

Oftentimes, when working with data, you don’t necessarily want to see the data itself. Rather, you want information about the data. The SQL syntax includes a number of functions that allow you to interpret or run calculations on your data just by issuing a SELECT query. These are known as aggregate functions.

The COUNT function counts and returns the number of rows that match a certain criteria. For example, if you’d like to know how many of your friends prefer tofu for their birthday entree, you could issue this query:

SELECT COUNT(entree) FROM dinners WHERE entree = 'tofu';

Output count
 2

(1 row)

The AVG function returns the average (mean) value of a column. Using our example table, you could find the average best score amongst your friends with this query:

SELECT AVG(best) FROM tourneys;

Output avg

252.8
(1 row)

SUM is used to find the total sum of a given column. For instance, if you’d like to see how many games you and your friends have bowled over the years, you could run this query:

SELECT SUM(wins) FROM tourneys;

Output sum

35
(1 row)

Note that the AVG and SUM functions will only work correctly when used with numeric data. If you try to use them on non-numerical data, it will result in either an error or just 0, depending on which RDBMS you’re using:

SELECT SUM(entree) FROM dinners;

OutputERROR: function sum(character varying) does not exist
LINE 1: select sum(entree) from dinners;
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

MIN is used to find the smallest value within a specified column. You could use this query to see what the worst overall bowling record is so far (in terms of number of wins):

SELECT MIN(wins) FROM tourneys;

Output min

2
(1 row)

Similarly, MAX is used to find the largest numeric value in a given column. The following query will show the best overall bowling record:

SELECT MAX(wins) FROM tourneys;

Output max

13
(1 row)

Unlike SUM and AVG, the MIN and MAX functions can be used for both numeric and alphabetic data types. When run on a column containing string values, the MIN function will show the first value alphabetically:

SELECT MIN(name) FROM dinners;

Output min

Barbara
(1 row)

Likewise, when run on a column containing string values, the MAX function will show the last value alphabetically:

SELECT MAX(name) FROM dinners;

Output max

Irma
(1 row)

Aggregate functions have many uses beyond what was described in this section. They’re particularly useful when used with the GROUP BY clause, which is covered in the next section along with several other query clauses that affect how result sets are sorted.


Manipulating Query Outputs

In addition to the FROM and WHERE clauses, there are several other clauses which are used to manipulate the results of a SELECT query. In this section, we will explain and provide examples for some of the more commonly-used query clauses.

One of the most frequently-used query clauses, aside from FROM and WHERE, is the GROUP BY clause. It’s typically used when you’re performing an aggregate function on one column, but in relation to matching values in another.

For example, let’s say you wanted to know how many of your friends prefer each of the three entrees you make. You could find this info with the following query:

SELECT COUNT(name), entree FROM dinners GROUP BY entree;

Output count | entree
-------+---------
1 | chicken
2 | steak
2 | tofu
(3 rows)

The ORDER BY clause is used to sort query results. By default, numeric values are sorted in ascending order, and text values are sorted in alphabetical order. To illustrate, the following query lists the name and birthdate columns, but sorts the results by birthdate:

SELECT name, birthdate FROM dinners ORDER BY birthdate;

Output name | birthdate
---------+------------
Etta | 1938-01-25
Irma | 1941-02-18
Gladys | 1944-05-28
Dolly | 1946-01-19
Barbara | 1948-12-25
(5 rows)

Notice that the default behavior of ORDER BY is to sort the result set in ascending order. To reverse this and have the result set sorted in descending order, close the query with DESC:

SELECT name, birthdate FROM dinners ORDER BY birthdate DESC;

Output name | birthdate
---------+------------
Barbara | 1948-12-25
Dolly | 1946-01-19
Gladys | 1944-05-28
Irma | 1941-02-18
Etta | 1938-01-25
(5 rows)

As mentioned previously, the WHERE clause is used to filter results based on specific conditions. However, if you use the WHERE clause with an aggregate function, it will return an error, as is the case with the following attempt to find which sides are the favorite of at least three of your friends:

SELECT COUNT(name), side FROM dinners WHERE COUNT(name) >= 3;

OutputERROR: aggregate functions are not allowed in WHERE
LINE 1: SELECT COUNT(name), side FROM dinners WHERE COUNT(name) >= 3...

The HAVING clause was added to SQL to provide functionality similar to that of the WHERE clause while also being compatible with aggregate functions. It’s helpful to think of the difference between these two clauses as being that WHERE applies to individual records, while HAVING applies to group records. To this end, any time you issue a HAVING clause, the GROUP BY clause must also be present.

The following example is another attempt to find which side dishes are the favorite of at least three of your friends, although this one will return a result without error:

SELECT COUNT(name), side FROM dinners GROUP BY side HAVING COUNT(name) >= 3;

Output count | side
-------+-------
3 | fries
(1 row)

Aggregate functions are useful for summarizing the results of a particular column in a given table. However, there are many cases where it’s necessary to query the contents of more than one table. We’ll go over a few ways you can do this in the next section.


Querying Multiple Tables

More often than not, a database contains multiple tables, each holding different sets of data. SQL provides a few different ways to run a single query on multiple tables.

The JOIN clause can be used to combine rows from two or more tables in a query result. It does this by finding a related column between the tables and sorts the results appropriately in the output.

SELECT statements that include a JOIN clause generally follow this syntax:

SELECT table1.column1, table2.column2
FROM table1
JOIN table2 ON table1.related_column=table2.related_column;

Note that because JOIN clauses compare the contents of more than one table, the previous example specifies which table to select each column from by preceding the name of the column with the name of the table and a period. You can specify which table a column should be selected from like this for any query, although it’s not necessary when selecting from a single table, as we’ve done in the previous sections. Let’s walk through an example using our sample data.

Imagine that you wanted to buy each of your friends a pair of bowling shoes as a birthday gift. Because the information about your friends’ birthdates and shoe sizes are held in separate tables, you could query both tables separately then compare the results from each. With a JOIN clause, though, you can find all the information you want with a single query:

SELECT tourneys.name, tourneys.size, dinners.birthdate
FROM tourneys
JOIN dinners ON tourneys.name=dinners.name;

Output name | size | birthdate
---------+------+------------
Dolly | 8.5 | 1946-01-19
Etta | 9 | 1938-01-25
Irma | 7 | 1941-02-18
Barbara | 7.5 | 1948-12-25
Gladys | 8 | 1944-05-28
(5 rows)

The JOIN clause used in this example, without any other arguments, is an inner JOIN clause. This means that it selects all the records that have matching values in both tables and prints them to the results set, while any records that aren’t matched are excluded. To illustrate this idea, let’s add a new row to each table that doesn’t have a corresponding entry in the other:

INSERT INTO tourneys (name, wins, best, size)
VALUES ('Bettye', '0', '193', '9');

INSERT INTO dinners (name, birthdate, entree, side, dessert)
VALUES ('Lesley', '1946-05-02', 'steak', 'salad', 'ice cream');

Then, re-run the previous SELECT statement with the JOIN clause:

SELECT tourneys.name, tourneys.size, dinners.birthdate
FROM tourneys
JOIN dinners ON tourneys.name=dinners.name;

Output name | size | birthdate
---------+------+------------
Dolly | 8.5 | 1946-01-19
Etta | 9 | 1938-01-25
Irma | 7 | 1941-02-18
Barbara | 7.5 | 1948-12-25
Gladys | 8 | 1944-05-28
(5 rows)

Notice that, because the tourneys table has no entry for Lesley and the dinners table has no entry for Bettye, those records are absent from this output.

It is possible, though, to return all the records from one of the tables using an outer JOIN clause. Outer JOIN clauses are written as either LEFT JOIN, RIGHT JOIN, or FULL JOIN.

A LEFT JOIN clause returns all the records from the “left” table and only the matching records from the right table. In the context of outer joins, the left table is the one referenced by the FROM clause, and the right table is any other table referenced after the JOIN statement.

Run the previous query again, but this time use a LEFT JOIN clause:

SELECT tourneys.name, tourneys.size, dinners.birthdate
FROM tourneys
LEFT JOIN dinners ON tourneys.name=dinners.name;

This command will return every record from the left table (in this case, tourneys) even if it doesn’t have a corresponding record in the right table. Any time there isn’t a matching record from the right table, it’s returned as a blank value or NULL, depending on your RDBMS:

Output  name   | size | birthdate
---------+------+------------
Dolly | 8.5 | 1946-01-19
Etta | 9 | 1938-01-25
Irma | 7 | 1941-02-18
Barbara | 7.5 | 1948-12-25
Gladys | 8 | 1944-05-28
Bettye | 9 |
(6 rows)

Now run the query again, this time with a RIGHT JOIN clause:

SELECT tourneys.name, tourneys.size, dinners.birthdate
FROM tourneys
RIGHT JOIN dinners ON tourneys.name=dinners.name;

This will return all the records from the right table (dinners). Because Lesley’s birthdate is recorded in the right table, but there is no corresponding row for her in the left table, the name and size columns will return as blank values in that row:

Output  name   | size | birthdate
---------+------+------------
Dolly | 8.5 | 1946-01-19
Etta | 9 | 1938-01-25
Irma | 7 | 1941-02-18
Barbara | 7.5 | 1948-12-25
Gladys | 8 | 1944-05-28
| | 1946-05-02
(6 rows)

Note that left and right joins can be written as LEFT OUTER JOIN or RIGHT OUTER JOIN, although the OUTER part of the clause is implied. Likewise, specifying INNER JOIN will produce the same result as just writing JOIN.

There is a fourth join clause called FULL JOIN available for some RDBMS distributions, including PostgreSQL. A FULL JOIN will return all the records from each table, including any null values:

SELECT tourneys.name, tourneys.size, dinners.birthdate
FROM tourneys
FULL JOIN dinners ON tourneys.name=dinners.name;

Output name | size | birthdate
---------+------+------------
Dolly | 8.5 | 1946-01-19
Etta | 9 | 1938-01-25
Irma | 7 | 1941-02-18
Barbara | 7.5 | 1948-12-25
Gladys | 8 | 1944-05-28
Bettye | 9 |
| | 1946-05-02
(7 rows)

Note: As of this writing, the FULL JOIN clause is not supported by either MySQL or MariaDB.

As an alternative to using FULL JOIN to query all the records from multiple tables, you can use the UNION clause.

The UNION operator works slightly differently than a JOIN clause: instead of printing results from multiple tables as unique columns using a single SELECT statement, UNION combines the results of two SELECT statements into a single column.

To illustrate, run the following query:

SELECT name FROM tourneys UNION SELECT name FROM dinners;

This query will remove any duplicate entries, which is the default behavior of the UNION operator:

Output  name

Irma
Etta
Bettye
Gladys
Barbara
Lesley
Dolly
(7 rows)

To return all entries (including duplicates) use the UNION ALL operator:

SELECT name FROM tourneys UNION ALL SELECT name FROM dinners;

Output name

Dolly
Etta
Irma
Barbara
Gladys
Bettye
Dolly
Etta
Irma
Barbara
Gladys
Lesley
(12 rows)

The names and number of the columns in the results table reflect the name and number of columns queried by the first SELECT statement. Note that when using UNION to query multiple columns from more than one table, each SELECT statement must query the same number of columns, the respective columns must have similar data types, and the columns in each SELECT statement must be in the same order. The following example shows what might result if you use a UNION clause on two SELECT statements that query a different number of columns:

SELECT name FROM dinners UNION SELECT name, wins FROM tourneys;

OutputERROR: each UNION query must have the same number of columns
LINE 1: SELECT name FROM dinners UNION SELECT name, wins FROM tourne...

Another way to query multiple tables is through the use of subqueries. Subqueries (also known as inner or nested queries) are queries enclosed within another query. These are useful in cases where you’re trying to filter the results of a query against the result of a separate aggregate function.

To illustrate this idea, say you want to know which of your friends have won more matches than Barbara. Rather than querying how many matches Barbara has won then running another query to see who has won more games than that, you can calculate both with a single query:

SELECT name, wins FROM tourneys
WHERE wins > (
SELECT wins FROM tourneys WHERE name = 'Barbara'
);

Output name | wins
--------+------
Dolly | 7
Etta | 4
Irma | 9
Gladys | 13
(4 rows)

The subquery in this statement was run only once; it only needed to find the value from the wins column in the same row as Barbara in the name column, and the data returned by the subquery and outer query are independent of one another. There are cases, though, where the outer query must first read every row in a table and compare those values against the data returned by the subquery in order to return the desired data. In this case, the subquery is referred to as a correlated subquery.

The following statement is an example of a correlated subquery. This query seeks to find which of your friends have won more games than is the average for those with the same shoe size:

SELECT name, size FROM tourneys AS t
WHERE wins > (
SELECT AVG(wins) FROM tourneys WHERE size = t.size
);

In order for the query to complete, it must first collect the name and size columns from the outer query. Then, it compares each row from that result set against the results of the inner query, which determines the average number of wins for individuals with identical shoe sizes. Because you only have two friends that have the same shoe size, there can only be one row in the result set:

Output name | size
------+------
Etta | 9
(1 row)

As mentioned earlier, subqueries can be used to query results from multiple tables. To illustrate this with one final example, say you wanted to throw a surprise dinner for the group’s all-time best bowler. You could find which of your friends has the best bowling record and return their favorite meal with the following query:

SELECT name, entree, side, dessert
FROM dinners
WHERE name = (SELECT name FROM tourneys
WHERE wins = (SELECT MAX(wins) FROM tourneys));

Output name | entree | side | dessert
--------+--------+-------+-----------
Gladys | steak | fries | ice cream
(1 row)

Notice that this statement not only includes a subquery, but also contains a subquery within that subquery.


Conclusion

Issuing queries is one of the most commonly-performed tasks within the realm of database management. There are a number of database administration tools, such as phpMyAdmin or pgAdmin, that allow you to perform queries and visualize the results, but issuing SELECT statements from the command line is still a widely-practiced workflow that can also provide you with greater control.


Learn More

Node.js - Express Persistent Session Store with PostgreSQL + Sequelize

Build a Basic App with Spring Boot and JPA using PostgreSQL

An illustrated guide to Kubernetes Networking

How to Install PostgreSQL on Ubuntu 18.04

The Complete Python & PostgreSQL Developer Course

SQL & Database Design A-Z™: Learn MS SQL Server + PostgreSQL

The Complete SQL Bootcamp

The Complete Oracle SQL Certification Course

Originally published by Mark Drake at https://www.digitalocean.com

2020 PostgreSQL Trends Report

2020 PostgreSQL Trends Report

In this latest PostgreSQL trends report, we analyze the most popular cloud providers for PostgreSQL, VACUUM strategies, query management ...

PostgreSQL popularity is skyrocketing in the enterprise space. As this open source database continues to pull new users from expensive commercial database management systems like Oracle, DB2, and SQL Server, organizations are adopting new approaches and evolving their own to maintain the exceptional performance of their SQL deployments.

We recently attended the PostgresConf event in San Jose to hear from the most active PostgreSQL user base on their database management strategies. In this latest PostgreSQL trends report, we analyze the most popular cloud providers for PostgreSQL, VACUUM strategies, query management strategies, and on-premises vs public cloud use being leveraged by enterprise organizations.

You might also like:  Secure Node.js, Express.js and PostgreSQL API using Passport.js

Most Popular Cloud Providers for PostgreSQL Hosting

Let’s start with the most popular cloud providers for PostgreSQL hosting. It comes as no surprise that the top three cloud providers in the world made up 100% of the PostgreSQL deployments in the crowd across this enterprise report. AWS, however, has taken a significant leap from our last report, where they now average 77.4% of PostgreSQL cloud use compared to 55.0% in April. AWS does offer a managed hosting service for PostgreSQL called Amazon RDS, but there are many other DBaaS solutions that offer PostgreSQL hosting on AWS, such as ScaleGrid, that can provide multi-cloud support so you’re not locked in with a single cloud provider.

AWS was not the only cloud provider to grow – we found that 19.4% of PostgreSQL cloud deployments were hosted through Google Cloud Platform (GCP), growing 11% from April where they only averaged 17.5% of PostgreSQL hosting. This leaves our last cloud provider – Microsoft Azure, who represented 3.2% of PostgreSQL cloud deployments in this survey. This is one of the most shocking discoveries, as Azure was tied for second with GCP back in April, and is commonly a popular choice for enterprise organizations leveraging the Microsoft suite of services.

Most Used Languages With PostgreSQL

This is a new analysis we surveyed to see which languages are most popularly used with PostgreSQL. The supported programming languages for PostgreSQL include .Net, C, C++, Delphi, Java, JavaScript (Node.js), Perl, PHP, Python, and Tcl, but PostgreSQL can support many server-side procedural languages through its available extensions.

We found that Java is the most popular programming language for PostgreSQL, being leveraged by 31.1% of enterprise organizations on average. PostgreSQL can be easily connected with Java programs through the popular open source PostgreSQL Java Database Connectivity (JBDC) Driver, also known as PgJDBC.

Python was the second most popular programming language used with PostgreSQL, coming in close at an average of 28.9% use with PostgreSQL. Back in 2013, PostgreSQL surveyed their users to see which external programming languages was most often used with PostgreSQL, and found that Python only represented 10.5% of the results, showing a massive increase in popularity over the past six years.

The programming language C came in third place, averaging 20.0% use with PostgreSQL, followed by Go in fourth at 13.3%, PL/pgSQL in fifth at 11.1%, Ruby in sixth at 8.9% and both PHP and Perl in seventh at 4.4%. PHP was actually the most popular language used with PostgreSQL in 2013, representing almost half of the responses from their survey at 47.1% use. The last column, Other, was represented by C++, Node.js, Javascript, Spark, Swift, Kotlin, Typescript, C#, Scala, R, .NET, Rust and Haskell.

Most Popular PostgreSQL VACUUM Strategies

PostgreSQL VACUUM is a technique to remove tuples that have been deleted or are now obsolete from their table to reclaim storage occupied by those dead tuples, also known as Bloat. VACUUM is an important process to maintain, especially for frequently-updated tables before it starts affecting your PostgreSQL performance. In our survey, we asked enterprise PostgreSQL users how they are handling VACUUM to see what the most popular approaches are.

The most popular process for PostgreSQL VACUUM is the built-in autovacuum, being leveraged by 37.5% of enterprise organizations on average. The autovacuum daemon is optional, but highly recommended in the PostgreSQL community, at it automates both VACUUM and ANALYZE commands, continuously checking tables for deal tuples. While highly recommended, 33.3% of PostgreSQL users prefer to manually perform VACUUM in the enterprise space. Fibrevillage has a great article that outlines these common problems with autovacuum which may cause an organization to adopt a manual strategy:

  • autovacuum may run even when turned off to deal with transaction ID wraparound.
  • autovacuum is constantly running, which makes it start over every time it runs out of space, and start a new worker for each database in your cluster.
  • autovacuum can cause out of memory errors.
  • autovacuum may have trouble keeping up on a busy server.
  • autovacuum can easily consume too much I/O capacity.

Another surprising discovery was that 18.8% of organizations do not use VACUUM, as it is not yet needed. This may be because they are leveraging PostgreSQL in small applications or applications that are not frequently updated. 6.6% of organizations have developed a custom solution for PostgreSQL VACUUM, and 4.2% are in the process of planning their VACUUM strategy.

Most Popular PostgreSQL Slow Query Management Strategies

If you’re working with PostgreSQL, you likely know that managing queries is the #1 most time-consuming task. It’s a critical process with many aspects to consider, starting at developing a query plan to match your query structure with your data properties, to then analyzing slow-running queries, finally to optimizing those queries through performance tuning.

We found that 54.3% of PostgreSQL users are manually managing slow queries in enterprise organizations. This can be accomplished through their modules auto_explain and pg_stat_statements, checking pg_stat_activity for table and index activity on your server, analyzing the slow query log, or reviewing in your code.

On average, 21.7% of enterprise organizations are leveraging a monitoring tool to analyze and manage their PostgreSQL slow queries. This helps them significantly reduce the time it takes to identify which queries are running the slowest, most frequently, causing the most read or write load on your system, or queries missing an index by examining the rows.

17.4% of users, however, are not actively managing slow queries in the PostgreSQL deployments. We highly recommend adopting a query management strategy to ensure slow queries are not affecting the performance of your PostgreSQL deployments. 4.3% of users are currently in the process of planning their query management strategy, and 2.2% have developed a custom solution for managing their slow queries.

PostgreSQL Cloud vs On-Premises Deployments

Let’s end with one of the hottest topics in the PostgreSQL enterprise space – whether to deploy PostgreSQL in the cloud or on-premises. We’ve been actively monitoring this trend all year, and found that 59.6% of PostgreSQL deployments were strictly on-premises back in April from our 2019 PostgreSQL Trends Report and 55.8% on-premises in our 2019 Open Source Database Report just a few months ago in June.

Now, in this most recent report, we found that PostgreSQL on-premises deployments have decreased by 40% since April of 2019. On average, only 35.6% of PostgreSQL enterprise organizations are deploying exclusively on-premise. But organizations are not migrating their on-premises deployments altogether – 24.4% of PostgreSQL deployments were found to be leveraging a hybrid cloud environment. Hybrid clouds are a mix of on-premises, private cloud, and/or public cloud computing to support their applications and data. This is a significant increase from what we saw in April, jumping from 5.6% of PostgreSQL deployments up to 24.4% in September.

Hybrid cloud deployments are becoming more popular across the board — this recent report found that 57% of businesses opt for a hybrid cloud environment using both private and public clouds as places to store their data. While we see a large jump to the cloud, enterprise organizations are still leveraging on-premises environments in some capacity 60% of the time, compared to 65.2% in April. Lastly, we found that public cloud PostgreSQL deployments have grown 15% since April, now averaging 34.8% use by enterprise organizations.

It’s also important to note that this survey was conducted at the PostgresConf Silicon Valley event, while our April survey was conducted in New York City. The bay area is widely known for adopting new technologies, which allows us to hypothesize that this market has a higher cloud adoption rate than the east coast.

| PostgreSQL Deployment Types | Apr | Jun | Sep | Apr-Sep Growth |
| On-Premises | 59.6% | 55.8% | 35.6% | -40.0% |
| Hybrid Cloud | 5.6% | 16.3% | 24.4% | 336% |
| Public Cloud | 34.8% | 27.9% | 40.0% | 15.0% |

So, how do these results stack up to your PostgreSQL deployments and strategies?

Thank for reading! Please share if you liked it!

Further Reading

How to Install PostgreSQL on Ubuntu 18.04

An Introduction to Queries in PostgreSQL