PostgreSQL, select rows if unique

I'm trying to do a two part query and return a text array of semicolon delimited column values.

I'm trying to do a two part query and return a text array of semicolon delimited column values.

First, select only rows that are unique based on values of three columns (i.e. if the three tuple of values exists more than once, it is not unique). Of the remaining rows, perform another filer based on a fourth column.

This is how I think about it, but maybe there's a better solution.

I've tried this a few different ways. My current attempt is using CTE:

with uniqe as (
    select distinct on (
        col1,
        col2,
        col3
    ) *
    from MyTable
)
select concat(col::text, ';', col2::text, ';', col3)
    as key
    from uniqe
    where upper(dateRange) <= (now() - interval '1 days')
    order by key;

The issue I'm running into is that SELECT DISTINCT ON (col1, col2, col3) ... seems to pick at least 1 row of the others that I'm not considering "unique".

Just to be clear, here's a sample table:

 id  | col1 | col2 | col3 |                       dateRange
-----+------+------+------+-------------------------------------------------------
  1  |   1  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  2  |   1  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  3  |   1  |   1  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  4  |   1  |   2  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  5  |   2  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  6  |   2  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  7  |   1  |   2  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  8  |   1  |   2  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")

I consider rows 3 & 4 to be the only unique rows.

PostgreSQL Tutorial - Learn PostgreSQL Database from Scratch

PostgreSQL Tutorial - Learn PostgreSQL Database from Scratch

PostgreSQL Tutorial - Learn PostgreSQL Database from Scratch. Learn how to use PostgreSQL in this full course. PostgreSQL is a general purpose and object-relational database management system. It is the most advanced open source database system widely used to build back-end systems. What is a Database? What is SQL And Relational Database? What is PostreSQL AKA Postrgres? PostgreSQL Installation. How to Create Database.

Learn PostgreSQL Tutorial - Full Course for Beginners

Learn how to use PostgreSQL Database in this full course.

PostgreSQL is a general purpose and object-relational database management system. It is the most advanced open source database system widely used to build back-end systems.

⭐️ Contents ⭐️
⌨️ (0:03:16) What is a Database
⌨️ (0:05:17) What is SQL And Relational Database
⌨️ (0:09:10) What is PostreSQL AKA Postrgres
⌨️ (0:10:53) PostgreSQL Installation (Mac OS)
⌨️ (0:14:21) PostgreSQL Installation (Windows)
⌨️ (0:17:38) GUI Clients vs Terminal/CMD Clients
⌨️ (0:21:39) Setup PSQL (MAC OS)
⌨️ (0:25:22) Setup PSQL (Windows)
⌨️ (0:30:15) How to Create Database
⌨️ (0:33:35) How to Connect to Databases
⌨️ (0:38:12) A Very Dangerous Command
⌨️ (0:41:37) How To Create Tables
⌨️ (0:45:46) Creating Tables Without Constraints
⌨️ (0:49:12) Creating Tables with Constraints
⌨️ (0:55:55) Insert Into
⌨️ (0:59:14) Insert Into Example
⌨️ (1:02:36) Generate 1000 Rows with Mockaroo
⌨️ (1:12:28) Select From
⌨️ (1:15:18) Order By
⌨️ (1:19:53) Distinct
⌨️ (1:21:59) Where Clause and AND
⌨️ (1:25:29) Comparison Operators
⌨️ (1:29:35) Limit, Offset & Fetch
⌨️ (1:32:43) IN
⌨️ (1:35:43) Between
⌨️ (1:37:45) Like And iLike
⌨️ (1:43:10) Group By
⌨️ (1:46:41) Group By Having
⌨️ (1:52:08) Adding New Table And Data Using Mockaroo
⌨️ (1:55:40) Calculating Min, Max & Average
⌨️ (1:59:48) Sum
⌨️ (2:01:55) Basics of Arithmetic Operators
⌨️ (2:05:59) Arithmetic Operators (ROUND)
⌨️ (2:09:43) Alias
⌨️ (2:12:32) Coalesce
⌨️ (2:16:15) NULLIF
⌨️ (2:20:21) Timestamps And Dates Course
⌨️ (2:23:21) Adding And Subtracting With Dates
⌨️ (2:25:58) Extracting Fields From Timestamp
⌨️ (2:27:28) Age Function
⌨️ (2:29:24) What Are Primary Keys
⌨️ (2:31:23) Understanding Primary Keys
⌨️ (2:36:26) Adding Primary Key
⌨️ (2:40:55) Unique Constraints
⌨️ (2:49:15) Check Constraints
⌨️ (2:54:45) How to Delete Records
⌨️ (3:01:36) How to Update Records
⌨️ (3:05:55) On Conflict Do Nothing
⌨️ (3:11:09) Upsert
⌨️ (3:16:41) What Is A Relationship/Foreign Keys
⌨️ (3:19:48) Adding Relationship Between Tables
⌨️ (3:25:04) Updating Foreign Keys Columns
⌨️ (3:29:30) Inner Joins
⌨️ (3:35:17) Left Joins
⌨️ (3:40:53) Deleting Records With Foreign Keys
⌨️ (3:47:27) Exporting Query Results to CSV
⌨️ (3:50:42) Serial & Sequences
⌨️ (3:57:18) Extensions
⌨️ (3:59:39) Understanding UUID Data Type
⌨️ (4:05:54) UUID As Primary Keys
⌨️ (4:16:30) Conclusion

Learn & Use PostgreSQL - Getting Started with Sequelize and PostgreSQL

Learn & Use PostgreSQL - Getting Started with Sequelize and PostgreSQL

In this tutorial, we will learn about NodeJS ORM called Sequelize interfacing with PostgreSQL Database. We are going to learn and use PostgreSQL. Sequelize is the NodeJS module which provides common functionalities and utilities to manipulate SQL based databases. Technically, it is an ORM ( Object relational mapping ) module for NodeJS. It supports the following databases: MySQL, PostgreSQL, MariaDB, SQLite, MSSQL

In this tutorial, we will learn about NodeJS ORM called Sequelize interfacing with PostgreSQL Database.

Introducing Sequelize

Sequelize is the NodeJS module which provides common functionalities and utilities to manipulate SQL based databases. Technically, it is an ORM ( Object relational mapping ) module for NodeJS. It supports the following databases:

  • MySQL
  • PostgreSQL
  • MariaDB.
  • SQLite.
  • MSSQL

We have already covered tutorials on MySQL, SQLite, and MariaDB. In this tutorial, we are going to learn and use PostgreSQL.

About PostgreSQL

PostgreSQL is considered to be the most advanced relational database system. It offers amazing features such as multi-version concurrency control, reliability, availability and strong data types backed by the 15 years of research and development.

I have not personally used it in a production system hence cannot provide any live experience. If any of you used it, please provide your feedback in comments.

We will use this to test out the Sequelize ORM. To begin with, visit the official site of PostgreSQL and download it. It supports all Major operating system.

Once you have installed it, open up the PostgreSQL admin screen. It will look something like this.


Awesome! PostgreSQL is up and running!

Getting Started with Sequelize and PostgreSQL

Let’s get started with Sequelize. To begin with, create a new folder and switch to it using Terminal.

Then run the following command to generate a new fresh NodeJS project.

npm init --y

This generate the following:

{
  "name": "sequalize-postgres",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo "Error: no test specified" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC"
}

Then, install node dependencies for Sequelize and PostgreSQL using the following command.

npm i --save sequelize pg pg-hstore

We will do the following task with Sequelize and PostgreSQL:

  • Connecting to the Database
  • Creating the model
  • Performing CRUD Operation
  • Monitoring Queries

Connecting to the Database

Create new file say app.js and add the following code.

const Sequelize = require('sequelize');
const sequelize = new Sequelize('codeforgeek', 'postgres', 'shahid', {
  host: 'localhost',
  dialect: 'postgres',
  pool: {
    max: 9,
    min: 0,
    idle: 10000
  }
});

sequelize.authenticate().then(() => {
  console.log("Success!");
}).catch((err) => {
  console.log(err);
});

Run the code using the following command.

node app.js

Since we haven’t created the database in PostgreSQL, it will return the following error ( You will receive more fields ).

{
  name: 'SequelizeConnectionError',
  message: 'database "codeforgeek" does not exist'
}

To create the database, open your PostgreSQL Admin.

Give your database a proper name and choose the user.

Now, run the same code and you should see the message as shown below.

Let’s create tables inside the PostgreSQL using Sequelize.

Creating model using Sequelize

Place this code inside the authenticate() success block.

sequelize.authenticate().then(() => {
  console.log("Success!");
  var Posts = sequelize.define('posts', {
    title: {
      type: Sequelize.STRING
    },
    content: {
      type: Sequelize.STRING
    }
  }, {
    freezeTableName: true
  });

  Posts.sync({force: true}).then(function () {
    return Posts.create({
      title: 'Getting Started with PostgreSQL and Sequelize',
      content: 'Hello there'
    });
  });
}).catch((err) => {
  console.log(err);
});

Run the code and you should be seeing queries running in the terminal. Similar to what is shown below.

Performing CRUD Operation

We have already created a data in the table using the code shown above. Let’s do some reading.

To read something from SQL based databases, we use SELECT query. Something like this.

SELECT * FROM `posts` WHERE `id`=2019;

Sequelize provides functions to perform database reads. The function is findAll() which accepts various parameters to cover clauses such as WHERE, JOINS etc and returns promise function.

To read the data from the Posts table, we will use the following code.

 Posts.findAll({}).then((data) => {
    console.log(data);
 }).catch((err) => {
    console.log(err);
 });

If we want to add WHERE clause in it, then it can be tweaked to the following code.

 Posts.findAll({
  where: {
    id: '100'
  }
 }).then((data) => {
    console.log(data);
 }).catch((err) => {
    console.log(err);
 });

You can find all read operation combinations here.

You can update any fields using update() method. Here is the sample code.

 Posts.update({
    content: 'This is a tutorial to learn Sequelize and PostgreSQL'
  }, {
    where: {
      id: 1
    }
  }).then(() => {
    console.log('Updated');
  }).catch((e) => {
    console.log("Error"+e);
  });

You can delete the rows using destroy() function.

  Posts.destroy({where: {
    id: 1
  }}).then(() => {
    console.log("Deleted");
  }).catch((e) => {
    console.log("Error"+e);
  });

This covers all four CRUD operation.

Monitoring Queries

Sequelize runs SQL queries in the background to deal with the database. In the terminal, you can view those queries as well. For example, refer to the screenshot below.

You can monitor those queries live, on each method call, Sequelize executes these queries.

Conclusion

I have tried and tested various node modules to interface with a variety of databases. It’s tough to maintain the track and keeping in touch with the development of these modules because there is too many of them. I am glad something like Sequelize came up to solve these issue. One module, multiple databases is what I needed!

Let me know if you are going to work with Sequelize or share your experience if you have already used it.

PostgreSQL with Nodejs

Most of the applications need at some point to persist data. It can be through files, local storage, cloud services or often databases

If you do not have particular requirement relational database systems are usually a good default choice, and particularly PostgreSQL, a very powerful open source SQL server. Moreover, if you develop for the Nodejs platform you will not run short of tools and solutions.

So what would be the ideal library or tool for your next application?

If you are lucky enough to work for a company which has resources to have its own dedicated database team, you will probably have only to use the abstractions they provide in your language of choice (Javascript in our case) to call the stored procedures they will have carefully crafted. In that case, you will eventually see no SQL (Structured Query Language, the language used in many relational database systems) at all and won’t go out of your comfort zone. In practice, this case may not occur very often and the truth is that many Nodejs developers or even so-called full stack developers know very little about SQL and will seek for various abstractions in their favorite packages repository.

Lately, I came across a very interesting article written by Thomas Hunter which were describing the different levels of abstraction a database related library may have. The idea was to dissuade you to use frameworks called ORM (Object Relational Mapping) and to convince you to learn SQL. Of course, as often in software development, each kind of abstraction has its pros and cons and let’s review quickly what I learned in Thomas’ article.

If you want more details you should probably read Thomas’ article and skip the sections below.

Drivers

Drivers provide almost no abstraction at all. They usually only handle connections and the communication protocol with the database. They may also parse the responses into adapted data structures (most of the time in Javascript it will be JSON objects and arrays). The *de facto *PostgreSQL driver for Nodejs is pg.

Drivers are normally very fast and the most flexible solution as they will let you manually write SQL queries yourself. On the other hand, it might quickly become tedious and dangerous to write dynamic queries by hand as it would involve a lot of string manipulation.

Query builders

With query builders, instead of manipulating strings, you will be using functions whose API is usually close to the SQL grammar. In Javascript, functions compose very well with patterns such as partial application, decoration, composition, recursion, higher-order function, etc.

They will often let you create your own abstractions and solve many problems in an elegant way. They also provide a framework with more explicit constraints than strings can do (especially if you use some sort of types on top of the normal signatures) which usually makes you avoid making mistakes and fit better in your everyday tools (IDE and so).

A very popular query builders library is knex.

ORM

ORM is the highest level of abstraction and such frameworks come with usually a lot of features. The promise is to create a data model domain with idioms you know such classes, objects, etc; and then the framework will translate it into a database structure and convenient ways to query it through the models you will have defined. The obvious advantage is that you can handle everything in your language of choice and with concepts familiar to you, the underlying database entities and queries will stay very far from your application code. However, the way we think the application data model sometimes does not quite fit the way we would build a database for that model: it is known as the object-relational impedance mismatch. Such libraries have been creating passionate debates between their fans and the others. Of course, they definitely have their usage and the problem is not black or white, but there are recurrent criticisms:

  • When you learn an ORM (and this is already many hours of work) you learn syntax and concepts you probably will not be used outside of this particular ORM.
  • The SQL queries generated by an ORM may not be very efficient in the sense that ORM will have to tackle specialized queries in a more generalist way. Moreover, performances can also degrade as the ORM will often not return raw data but some model instances of it (in order to provide some features like validation, lazy loading, etc). You often trade performances for features.
  • ORM cannot generate some (basic sometimes) queries or have it wrong when used through their models. This problem is however easily worked around as they provide a low-level query interface to write the queries you want directly with the underlying driver.
  • You may not design the database, the database schema can be generated for you which may sometimes lead to very poor database design.

The most popular ORM for Nodejs is Sequelize.

I think we can do better…

Introducing Ship-Hold

Note all the examples will use the database generated by the script in annexes at the end of this article
In the 1970s (yes almost 50 years ago!). Edgar F. Codd created a Relational Algebra. SQL became the most common implementation of the algebra and has evolved a lot to become one of the most powerful query language and one of the most battle-tested. It includes now procedure capabilities and extensions (especially with PostgreSQL) which go beyond the scope of the algebra. Clearly, SQL is often neglected by application developers whereas its capabilities are enormous.

Through products (joins) you can cross data from different tables and organize your result data in infinite ways whereas with SQL functions you can aggregate data to format it the way you want. A single query although quite verbose is probably way more flexible than what you could do with Javascript even by composing small functions together.

You want to fetch all the posts including a field *author *with the first name and the last name of the author:

SELECT 
    “posts”.*, 
    concat_ws(‘ ‘,”users”.”first_name”,”users”.”last_name”) as “author” 
FROM “posts” JOIN “users” USING(“user_id”);

Actually, you prefer to include all author’s data as a JSON object:

SELECT 
    “posts”.*, 
    to_json(“users”.*) as “author” 
FROM “posts” JOIN “users” USING(“user_id”);

Or you want to know who are the five users who have published the most comments:

WITH 
    “commenters” AS (SELECT COUNT(*),”user_id” FROM “comments” GROUP BY “user_id” ORDER BY “count” DESC LIMIT 5)
SELECT “users”.* FROM “commenters” JOIN “users” USING (“user_id”);

Doing the same things in Javascript considering your table as arrays would require a great deal of code.

You are convinced and decide to learn SQL: great! What if you could write in Javascript with function calls, SQL queries the way you would write them in pure SQL? Good news, with ship-hold you can.

Let’s say you want to find all the posts about SQL published after 2010 including their author’s data ordered by publication date:

const posts = await sh
    .select('posts.*',{
        value: toJson('"users".*'),
        as:'author'
    })
    .from('posts')
    .where('title','ilike','%sql%')
    .and('published_at','>',new Date(2010,1,1))
    .join('users')
    .on('posts.user_id','"users"."user_id"')
    .orderBy('published_at','desc')
    .run();

And ship hold query builders are regular functions so you can add all the syntactic sugar you want. Even better you can compose builders together as you would do with SQL subqueries. For example, you could get the same result for the third example with the following code:

// A higher order function
const fromTable = table => (...args) => sh.select(...args).from(table);

const fromPosts = fromTable('posts');

const fromUsers = fromTable('users');
// A sub query
const getTopFiveCommenters = {
    value: fromPosts(count('*'), 'user_id')
        .groupBy('user_id')
        .orderBy('count', 'desc')
        .limit(5),
    as: 'talkative'
};

const users = await fromUsers()
    .where('user_id', 'IN', sh.select('user_id').from(getTopFiveCommenters))
    .run();

It basically has the power of SQL with the expressiveness of Javascript functions!

Eager loading: a not so easy problem.

SQL is very powerful but at some point, it becomes quite complicated for a human being to write a query to get the expected result. It is particularly true when you want to fetch related data from different tables and start to apply pagination to your result set or to a sub-result set. That’s one of the reasons ORM exist: they eventually help you to generate such queries.

Let’s say you want to get users with the posts they have written in case they have written some:

SELECT * FROM “users” LEFT JOIN “posts” USING (“user_id”)

Ok, now you want the users whose last name is Renard, and you want only the five first Renard ordered by alphanumeric order on their first name. Of course you still want the articles they eventually have written too.

SELECT * FROM “users” LEFT JOIN “posts” USING (“user_id”) WHERE “last_name” = ‘Renard’ ORDER BY “first_name” LIMIT 5

Is it correct ? Actually no: The limit will be applied to the join and as various rows may correspond to the same user (one for each written article) with this query you’ll be missing data.

SELECT * FROM (SELECT * FROM “users” WHERE “last_name”=’Renard’ ORDER BY “first_name” LIMIT 5) as “users” LEFT JOIN “posts” USING (“user_id”) ORDER BY “first_name”

This query would do the job, although the data parsed by the driver would then need to be aggregated: again various rows may correspond to the same user!

Now you also want to paginate the posts included: instead of getting all the articles written by a given user you want the three last published about Javascript. Ah, and you also want to include within these articles the five last published comment with their author’s data…

Ok, you get the point, it becomes quite complicated to write a query which will get you the right result.

Thankfully ORM and such will help you to get the right result. How they do may vary from one library to another. Some will make multiple queries then buffer the result and aggregate it programmatically. Others will inline the join queries, etc. A different strategy can get you the result but more or less efficiently.

Ship hold is not an ORM but it gives you the ability to define services related to a particular table. By using a service, it will simplify the syntax to make a query against a particular table, but it also gives you the ability to define relations between services. It will greatly help you then to write queries with inclusions. You basically only have to compose builders as you would construct your sentence in English.

const Users = sh.service({
    table: 'users',
    primaryKey: 'user_id'
});

const Posts = sh.service({
    table: 'posts',
    primaryKey: 'post_id'
});

const Comments = sh.service({
    table: 'comments',
    primaryKey: 'comment_id'
});
Users.hasMany(Posts);
Posts.belongsTo(Users, 'user_id', 'author');

Users.hasMany(Comments);
Posts.hasMany(Comments);
Comments.belongsTo(Users, 'user_id', 'author');
Comments.belongsTo(Posts, 'post_id', 'article');

Then get the result for our very complicated request aforementioned:

// Five last comments with their authors' data
const fiveLastPublishedComments = Comments
    .select()
    .orderBy('published_at', 'desc')
    .limit(5)
    .include(Users);
// Three last Javascript posts
const threeLastPostAboutJavascript = Posts
    .select()
    .where('title', 'ilike', '%javascript%')
    .orderBy('published_at', 'desc')
    .limit(3)
    .include(fiveLastPublishedComments);

// finally
const renards = await Users
    .select()
    .where('last_name', 'Renard')
    .orderBy('first_name')
    .limit(5)
    .include(threeLastPostAboutJavascript)
    .run();

And done!

I dare you to do it so easily with an other tool :)

About performances

You might wonder how good the queries generated will perform. It is very difficult to compare data access libraries together as the result might change quite a lot depending on the database itself: the indexes created, the amount of data, etc.

However, ship-hold tries to leverage as much as possible the database features where a lot of ORM will duplicate the effort in their Javascript code base. A good example is the aggregation. Where many ORM will perform several queries buffer the result (often with redundant data) then aggregate the result programmatically, ship-hold will ask the database to do the aggregation with its built-in mechanisms. It is also one of the reasons ship-hold is small compared to other libraries. The weight of the installed bundled is 771kbincluding pg driver and others dependencies.

Database makers strive to improve performances, and write usually faster code in C++, with better algorithms that I personally can do on the JS side.

Any performance improvement on the database engine would, therefore, improve ship-hold. Moreover, join queries and subqueries carry a part of redundant data. If you do the aggregation at the database level, I suspect that less data would need to go throw the wires which might improve the performances compared to a solution where data is buffered on the JS side and cleaned/aggregated. It could be especially true if your database is on a remote server.

Nevertheless, you can have a look at the numbers of a sample applicationimplemented with various popular libraries.

If you think ship-hold worth it, please share the project link and star the repository or clap this article. If you have any recommendations to improve it don’t hesitate to contribute. Thanks for reading.

Annexes

Database script

DROP TABLE IF EXISTS posts, users, comments, tags, posts_tags CASCADE;
CREATE TABLE users (
    user_id serial PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    biography TEXT,
    first_name VARCHAR(100) NOT NULL, 
    last_name VARCHAR(100) NOT NULL 
);

CREATE TABLE posts(
    post_id serial PRIMARY KEY,
    title VARCHAR NOT NULL,
    content TEXT,
    published_at TIMESTAMP,
    user_id integer REFERENCES users (user_id)
);

CREATE TABLE comments(
    comment_id serial PRIMARY KEY,
    content TEXT,
    published_at TIMESTAMP,
    user_id integer REFERENCES users (user_id),
    post_id integer REFERENCES posts (post_id)
);

CREATE TABLE tags(
    tag VARCHAR PRIMARY KEY,
    description TEXT
);

CREATE TABLE posts_tags(
    post_id INTEGER REFERENCES posts,
    tag VARCHAR REFERENCES tags,
    PRIMARY KEY (post_id, tag)
);

By : Laurent Renard