PostgreSQL with Nodejs

If you do not have particular requirement relational database systems are usually a good default choice, and particularly PostgreSQL, a very powerful open source SQL server. Moreover, if you develop for the Nodejs platform you will not run short of tools and solutions.

So what would be the ideal library or tool for your next application?

If you are lucky enough to work for a company which has resources to have its own dedicated database team, you will probably have only to use the abstractions they provide in your language of choice (Javascript in our case) to call the stored procedures they will have carefully crafted. In that case, you will eventually see no SQL (Structured Query Language, the language used in many relational database systems) at all and won’t go out of your comfort zone. In practice, this case may not occur very often and the truth is that many Nodejs developers or even so-called full stack developers know very little about SQL and will seek for various abstractions in their favorite packages repository.

Lately, I came across a very interesting article written by Thomas Hunter which were describing the different levels of abstraction a database related library may have. The idea was to dissuade you to use frameworks called ORM (Object Relational Mapping) and to convince you to learn SQL. Of course, as often in software development, each kind of abstraction has its pros and cons and let’s review quickly what I learned in Thomas’ article.

If you want more details you should probably read Thomas’ article and skip the sections below.

Drivers

Drivers provide almost no abstraction at all. They usually only handle connections and the communication protocol with the database. They may also parse the responses into adapted data structures (most of the time in Javascript it will be JSON objects and arrays). The *de facto *PostgreSQL driver for Nodejs is pg.

Drivers are normally very fast and the most flexible solution as they will let you manually write SQL queries yourself. On the other hand, it might quickly become tedious and dangerous to write dynamic queries by hand as it would involve a lot of string manipulation.

Query builders

With query builders, instead of manipulating strings, you will be using functions whose API is usually close to the SQL grammar. In Javascript, functions compose very well with patterns such as partial application, decoration, composition, recursion, higher-order function, etc.

They will often let you create your own abstractions and solve many problems in an elegant way. They also provide a framework with more explicit constraints than strings can do (especially if you use some sort of types on top of the normal signatures) which usually makes you avoid making mistakes and fit better in your everyday tools (IDE and so).

A very popular query builders library is knex.

ORM

ORM is the highest level of abstraction and such frameworks come with usually a lot of features. The promise is to create a data model domain with idioms you know such classes, objects, etc; and then the framework will translate it into a database structure and convenient ways to query it through the models you will have defined. The obvious advantage is that you can handle everything in your language of choice and with concepts familiar to you, the underlying database entities and queries will stay very far from your application code. However, the way we think the application data model sometimes does not quite fit the way we would build a database for that model: it is known as the object-relational impedance mismatch. Such libraries have been creating passionate debates between their fans and the others. Of course, they definitely have their usage and the problem is not black or white, but there are recurrent criticisms:

When you learn an ORM (and this is already many hours of work) you learn syntax and concepts you probably will not be used outside of this particular ORM.
The SQL queries generated by an ORM may not be very efficient in the sense that ORM will have to tackle specialized queries in a more generalist way. Moreover, performances can also degrade as the ORM will often not return raw data but some model instances of it (in order to provide some features like validation, lazy loading, etc). You often trade performances for features.
ORM cannot generate some (basic sometimes) queries or have it wrong when used through their models. This problem is however easily worked around as they provide a low-level query interface to write the queries you want directly with the underlying driver.
You may not design the database, the database schema can be generated for you which may sometimes lead to very poor database design.

The most popular ORM for Nodejs is Sequelize.

I think we can do better…

Introducing Ship-Hold

Note all the examples will use the database generated by the script in annexes at the end of this article
In the 1970s (yes almost 50 years ago!). Edgar F. Codd created a Relational Algebra. SQL became the most common implementation of the algebra and has evolved a lot to become one of the most powerful query language and one of the most battle-tested. It includes now procedure capabilities and extensions (especially with PostgreSQL) which go beyond the scope of the algebra. Clearly, SQL is often neglected by application developers whereas its capabilities are enormous.

Through products (joins) you can cross data from different tables and organize your result data in infinite ways whereas with SQL functions you can aggregate data to format it the way you want. A single query although quite verbose is probably way more flexible than what you could do with Javascript even by composing small functions together.

You want to fetch all the posts including a field *author *with the first name and the last name of the author:

SELECT 
    “posts”.*, 
    concat_ws(‘ ‘,”users”.”first_name”,”users”.”last_name”) as “author” 
FROM “posts” JOIN “users” USING(“user_id”);

Actually, you prefer to include all author’s data as a JSON object:

SELECT 
    “posts”.*, 
    to_json(“users”.*) as “author” 
FROM “posts” JOIN “users” USING(“user_id”);

Or you want to know who are the five users who have published the most comments:

WITH 
    “commenters” AS (SELECT COUNT(*),”user_id” FROM “comments” GROUP BY “user_id” ORDER BY “count” DESC LIMIT 5)
SELECT “users”.* FROM “commenters” JOIN “users” USING (“user_id”);

Doing the same things in Javascript considering your table as arrays would require a great deal of code.

You are convinced and decide to learn SQL: great! What if you could write in Javascript with function calls, SQL queries the way you would write them in pure SQL? Good news, with ship-hold you can.

Let’s say you want to find all the posts about SQL published after 2010 including their author’s data ordered by publication date:

const posts = await sh
    .select('posts.*',{
        value: toJson('"users".*'),
        as:'author'
    })
    .from('posts')
    .where('title','ilike','%sql%')
    .and('published_at','>',new Date(2010,1,1))
    .join('users')
    .on('posts.user_id','"users"."user_id"')
    .orderBy('published_at','desc')
    .run();

And ship hold query builders are regular functions so you can add all the syntactic sugar you want. Even better you can compose builders together as you would do with SQL subqueries. For example, you could get the same result for the third example with the following code:

// A higher order function
const fromTable = table => (...args) => sh.select(...args).from(table);

const fromPosts = fromTable('posts');

const fromUsers = fromTable('users');
// A sub query
const getTopFiveCommenters = {
    value: fromPosts(count('*'), 'user_id')
        .groupBy('user_id')
        .orderBy('count', 'desc')
        .limit(5),
    as: 'talkative'
};

const users = await fromUsers()
    .where('user_id', 'IN', sh.select('user_id').from(getTopFiveCommenters))
    .run();

It basically has the power of SQL with the expressiveness of Javascript functions!

Eager loading: a not so easy problem.

SQL is very powerful but at some point, it becomes quite complicated for a human being to write a query to get the expected result. It is particularly true when you want to fetch related data from different tables and start to apply pagination to your result set or to a sub-result set. That’s one of the reasons ORM exist: they eventually help you to generate such queries.

Let’s say you want to get users with the posts they have written in case they have written some:

SELECT * FROM “users” LEFT JOIN “posts” USING (“user_id”)

Ok, now you want the users whose last name is Renard, and you want only the five first Renard ordered by alphanumeric order on their first name. Of course you still want the articles they eventually have written too.

SELECT * FROM “users” LEFT JOIN “posts” USING (“user_id”) WHERE “last_name” = ‘Renard’ ORDER BY “first_name” LIMIT 5

Is it correct ? Actually no: The limit will be applied to the join and as various rows may correspond to the same user (one for each written article) with this query you’ll be missing data.

SELECT * FROM (SELECT * FROM “users” WHERE “last_name”=’Renard’ ORDER BY “first_name” LIMIT 5) as “users” LEFT JOIN “posts” USING (“user_id”) ORDER BY “first_name”

This query would do the job, although the data parsed by the driver would then need to be aggregated: again various rows may correspond to the same user!

Now you also want to paginate the posts included: instead of getting all the articles written by a given user you want the three last published about Javascript. Ah, and you also want to include within these articles the five last published comment with their author’s data…

Ok, you get the point, it becomes quite complicated to write a query which will get you the right result.

Thankfully ORM and such will help you to get the right result. How they do may vary from one library to another. Some will make multiple queries then buffer the result and aggregate it programmatically. Others will inline the join queries, etc. A different strategy can get you the result but more or less efficiently.

Ship hold is not an ORM but it gives you the ability to define services related to a particular table. By using a service, it will simplify the syntax to make a query against a particular table, but it also gives you the ability to define relations between services. It will greatly help you then to write queries with inclusions. You basically only have to compose builders as you would construct your sentence in English.

const Users = sh.service({
    table: 'users',
    primaryKey: 'user_id'
});

const Posts = sh.service({
    table: 'posts',
    primaryKey: 'post_id'
});

const Comments = sh.service({
    table: 'comments',
    primaryKey: 'comment_id'
});
Users.hasMany(Posts);
Posts.belongsTo(Users, 'user_id', 'author');

Users.hasMany(Comments);
Posts.hasMany(Comments);
Comments.belongsTo(Users, 'user_id', 'author');
Comments.belongsTo(Posts, 'post_id', 'article');

Then get the result for our very complicated request aforementioned:

// Five last comments with their authors' data
const fiveLastPublishedComments = Comments
    .select()
    .orderBy('published_at', 'desc')
    .limit(5)
    .include(Users);
// Three last Javascript posts
const threeLastPostAboutJavascript = Posts
    .select()
    .where('title', 'ilike', '%javascript%')
    .orderBy('published_at', 'desc')
    .limit(3)
    .include(fiveLastPublishedComments);

// finally
const renards = await Users
    .select()
    .where('last_name', 'Renard')
    .orderBy('first_name')
    .limit(5)
    .include(threeLastPostAboutJavascript)
    .run();

And done!

I dare you to do it so easily with an other tool :)

About performances

You might wonder how good the queries generated will perform. It is very difficult to compare data access libraries together as the result might change quite a lot depending on the database itself: the indexes created, the amount of data, etc.

However, ship-hold tries to leverage as much as possible the database features where a lot of ORM will duplicate the effort in their Javascript code base. A good example is the aggregation. Where many ORM will perform several queries buffer the result (often with redundant data) then aggregate the result programmatically, ship-hold will ask the database to do the aggregation with its built-in mechanisms. It is also one of the reasons ship-hold is small compared to other libraries. The weight of the installed bundled is 771kbincluding pg driver and others dependencies.

Database makers strive to improve performances, and write usually faster code in C++, with better algorithms that I personally can do on the JS side.

Any performance improvement on the database engine would, therefore, improve ship-hold. Moreover, join queries and subqueries carry a part of redundant data. If you do the aggregation at the database level, I suspect that less data would need to go throw the wires which might improve the performances compared to a solution where data is buffered on the JS side and cleaned/aggregated. It could be especially true if your database is on a remote server.

Nevertheless, you can have a look at the numbers of a sample applicationimplemented with various popular libraries.

If you think ship-hold worth it, please share the project link and star the repository or clap this article. If you have any recommendations to improve it don’t hesitate to contribute. Thanks for reading.

Annexes

Database script

DROP TABLE IF EXISTS posts, users, comments, tags, posts_tags CASCADE;
CREATE TABLE users (
    user_id serial PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    biography TEXT,
    first_name VARCHAR(100) NOT NULL, 
    last_name VARCHAR(100) NOT NULL 
);

CREATE TABLE posts(
    post_id serial PRIMARY KEY,
    title VARCHAR NOT NULL,
    content TEXT,
    published_at TIMESTAMP,
    user_id integer REFERENCES users (user_id)
);

CREATE TABLE comments(
    comment_id serial PRIMARY KEY,
    content TEXT,
    published_at TIMESTAMP,
    user_id integer REFERENCES users (user_id),
    post_id integer REFERENCES posts (post_id)
);

CREATE TABLE tags(
    tag VARCHAR PRIMARY KEY,
    description TEXT
);

CREATE TABLE posts_tags(
    post_id INTEGER REFERENCES posts,
    tag VARCHAR REFERENCES tags,
    PRIMARY KEY (post_id, tag)
);

By : Laurent Renard

#postgresql #node.js