How Indexes Work in Nebula Graph - DZone Database

How Indexes Work in Nebula Graph - DZone Database

In this article, take a look at how indexes work in Nebula Graph and see core concepts to understand them.

Why Indexes Are Needed in a Graph Database

Indexes are an indispensable function in a database system. Graph databases are no exception.

An index is actually a sorted data structure in the database management system. Different database systems adopt different sorting structures.

Popular index types include:

  • B-Tree index
  • B+-Tree index
  • B*-Tree index
  • Hash index
  • Bitmap index
  • Inverted index

Each of them uses their own sorting algorithms.

A database index allows efficient data retrieval from databases. Despite of the query performance improvement, there are some disadvantages of indexes:

  • It takes time to create and maintain indexes, which scales with dataset size.
  • Indexes need extra physical storage space.
  • It takes more time to insert, delete, and update data because the index also needs to be maintained synchronously.

Taking the above into consideration, Nebula Graph now supports indexes for more efficient retrieves on properties.

This post gives a detailed introduction to the design and practice of indexes in Nebula Graph.

Core Concepts to Understand Indexes in Nebula Graph

Below is a list of common Nebula Graph index terms we use across the post.

  • Tag: A label associated with a list of properties. Each vertex can associate with multiple tags. Tag is identified with a TagID. You can regard tag as a node table in SQL.
  • Edge: Similar to tag, edge type is a cluster of properties on edges. You can regard edge type as an edge table in SQL.
  • Property: The name-value pairs on tag or edge. Its data type is determined by the tag or edge type.
  • Partition: The minimum logical storage unit of Nebula Graph. A StorageEngine can contain multiple partitions. Partition is divided into leader and follower. We use Raft to guarantee data consistency between leader and follower.
  • Graph space: A physically isolated space for a specific graph. Tags and edge types in one graph are independent with those in another graph. A Nebula Graph cluster can have multiple graph spaces.
  • Index: Index in this post refers specifically to the index of ~ ~tag or edge type properties. Its data type depends on tag or edge type.
  • TagIndex: An index created for a tag. You can create multiple indexes for the same tag. Cross-tag composite index is yet to be supported.
  • EdgeIndex: An index created for an edge type. Similarly, you can create multiple indexes for the same edge type. Cross-edge-type composite index is yet to be supported.
  • Scan Policy: The policy to scan indexes. Usually, there are multiple methods to scan indexes to execute one query statement, but the scan policy itself gets to decide which method to use ultimately.
  • Optimizer: Optimize query conditions, such as sorting, splitting, and merging sub-expression nodes of the expression tree of the where clause. It’s used to obtain higher query efficiency.

What’s Required for Indexes to Work in a Graph Database

There are two typical ways to query data in Nebula Graph, or more generally in a graph database:

  1. One is starting from a vertex, retrieving its (N-hop) neighbors along certain edge types.
  2. Another is retrieving vertices or edges which contain specified property values.

In the latter scenario, a high-performance scan is needed to fetch the edges or vertices as well as the property values.

In order to improve the query efficiency of property values, we’ve implemented indexes in Nebula Graph. By sorting the property values of edges or vertices, users can quickly locate a certain property and avoid full scan.

Here’s what we found are required for indexes to work in a graph database:

  • Supporting indexes for properties on tags and edge types.
  • Supporting analysis and generation of index scanning strategy.
  • Supporting index management such as create index, rebuild index, show index, etc.

How Indexes Are Stored in Nebula Graph

Below is a diagram of how indexes are stored in Nebula Graph. Indexes are a part of Nebula Graph’s Storage Service so we place them in the big picture of its storage architecture.

Seen from the above figure, each Storage Server can contain multiple Storage Engines, each Storage Engine can contain multiple Partitions.

Different Partitions are synchronized via Raft protocol. Each Partition contains both data and indexes. The data and indexes of the same vertex or edge will be stored in the same Partition.

tutorial graph database index database indexes nebula graph database

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Migration From JanusGraph to Nebula Graph - Practice at 360 Finance

In this article, take a look at data migration from JanusGraph to Nebula Graph. Speaking of graph data processing, we have had experience in using various graph databases. In the beginning, we used the stand-alone edition of AgensGraph. Later, due to its performance limitations, we switched to JanusGraph, a distributed graph database.

Benchmarking the Mainstream Open Source Distributed Graph Databases

The deep learning and knowledge graph technologies have been developing rapidly in recent years. Find out more about graph databases.

Nebula Graph Source Code Explained via a Sample Graph Query

In this article, take a look at the Nebula Graph source code and see a sample graph query. When I saw the Nebula Graph code repository for the first time, I was so shocked by its huge size that I didn’t know how to dig into the source code.

Analyzing Relationships in Game of Thrones With NetworkX, Gephi, and Nebula Graph (Part 1)

In this article, see part one of how to analyze relationships in Game of Thrones with NetworksX, Gephi, and Nebula Graph.

Graph Therapy: The Year of the Graph Newsletter, June/May 2020

In this newsletter, see different material on graph databases including a key graph database technology, cutting edge research, and more!Parts of the world are still in lockdown, while others are returning to some semblance of normalcy.