Data Structures and Indexing for Operations on Deeply Nested Comments

You may wonder if this problem is still relevant in 2020. The answer is: very much so! Despite the great success of Neo4j, a graph database, and other NoSQL databases, such databases are still rarely seen in non-startup Spring Boot projects. So, if you are a Java developer, who works on such a project, and are looking for a JPQL-only solution for this problem, the proposed approach may be helpful for you.

This post demonstrates the data structure and algorithms, analyzes their complexity, and reviews the literature. If you are looking for an actual Spring Initializr project, go to Part 2. So, let us start.

The Requirements

We will build a CRM system for a mid-size online education center. The system is implemented with Spring Boot 2, H2 (development), and (for now) Ms-SQL(production) technology. Among many functionalities, the system must be able to create, read, update, and delete nested comments from/to a relational database. The functional requirements for the comment functionality are:

The most common operations are to find all descendants of a particular comment and to find all root comments,
All new and edited comments are moderated before being published and saved into the database,
If a comment is deleted, all its child comments are deleted as well,
The system should support full-text search over comments’ content.

We assume the following about the comments:

The comments are text strings no more than a certain length (no pictures or video fragments for now),
Every comment has 0 or 1 parent,
Most of the comments are questions/answers/hints on assignments and classes, so there going to be a lot more reads than writes,
Ones saved, the comments are rarely updated and even more rarely, if ever, deleted,
The older the comment gets, the lesser it is commented; the total number of child comments of a particular comment rarely exceeds a certain (large) number; the number depends on the depth of the comment.
For every root comment, there are hundreds or more sub root comments. This can be achieved if only users of a certain role or reputation can create root comments.
The majority of the comments are just few level deep. Much deeper comments are rare.

The following are the implementation details for the comment functionality:

The functionalities need to be implemented with JPQL only and no native queries,
There should be as few pessimistic locks as possible.

These requirements are not easy to satisfy simultaneously.

The Challenges

To implement these requirements, we face the following challenges:

The data structures must be compatible with the relational database (rows and columns), yet implement a deeply nested tree.
There should be efficient indexes to quickly find all children of a particular comment, yet no dialect-specific database features should be used.
If all comments are stored in a single table, all root comments can not be found quickly without advanced dialect-specific database features. Yet, if all root comments are stored in a separate table, we have to use transactions. (this challenge is addressed in Part 2).

#index #spring boot 2 #relational database #hierarchical data

The Requirements

The Challenges

dzone.com

Data Structures and Indexing for Operations on Deeply Nested Comments