Benchmarking the Mainstream Open Source Distributed Graph Databases

Benchmarking the Mainstream Open Source Distributed Graph Databases

The deep learning and knowledge graph technologies have been developing rapidly in recent years. Find out more about graph databases.

The deep learning and knowledge graph technologies have been developing rapidly in recent years. Compared with the "black box" of deep learning, knowledge graphs are highly interpretable, thus are widely adopted in such scenarios as search recommendations, intelligent customer support, and financial risk management. 

Meituan has been digging deep in the connections buried in the huge amount of business data over the past few years and has gradually developed the knowledge graphs in nearly ten areas, including cuisine graphs, tourism graphs, and commodity graphs. The ultimate goal is to enhance the smart local life. 

Compared with the traditional RDBMS, graph databases can store and query knowledge graphs more efficiently. It gains obvious performance advantage in multi-hop queries to select graph databases as the storage engine. Currently, there are dozens of graph database solutions out there on the market. 

It is imperative for the Meituan team to select a graph database solution that can meet the business requirements and to use the solution as the basis of Meituan's graph storage and graph learning platform. The team has outlined the basic requirements as below per our business status quo:

  1. It should be an open-source project which is also business-friendly

By having control over the source code, the Meituan team can ensure data security and service availability.

  1. It should support clustering and should be able to scale horizontally in terms of both storage and computation capabilities

The knowledge graph data size in Meituan can reach hundreds of billions of vertices and edges in total and the throughput can reach tens of thousands of QPS. With that being said, the single-node deployment cannot meet Meituan's storage requirements.

  1. It should work under OLTP scenarios with the capability of multi-hop queries at the millisecond level.

To ensure the best search experience for Meituan users, the team has strictly restricted the timeout value within all chains of paths. Therefore, it is unacceptable to respond to a query at the second level.

  1. It should be able to import data in batch

The knowledge graph data is usually stored in data warehouses like Hive. The graph database should be equipped with the capability to quickly import data from such warehouses to the graph storage to ensure service effectiveness.

The Meituan team has tried the top 30 graph databases on DB-Engines and found that most well-known graph databases only support single-node deployment with their open-source edition, for example, Neo4j, ArangoDB, Virtuoso, TigerGraph, RedisGraph. This means that the storage service cannot scale horizontally and the requirement to store large-scale knowledge graph data cannot be met. 

After thorough research and comparison, the team has selected the following graph databases for the final round: Nebula Graph (developed by a startup team who originally came from Alibaba), Dgraph (developed by a startup team who originally came from Google), and HugeGraph (developed by Baidu).

A Summary of The Testing Process 

Hardware Configuration

  1. Database instances: Docker containers running on different machines
  2. Single instance resources: 32 Cores, 64 GB Memory, 1 TB SSD (Intel(R) Xeon(R) Gold 5218 CPU @ 2.30 GHz)
  3. Number of instances: Three

database tutorial graph database database performance nebula graph dgraph graph database adoption

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Migration From JanusGraph to Nebula Graph - Practice at 360 Finance

In this article, take a look at data migration from JanusGraph to Nebula Graph. Speaking of graph data processing, we have had experience in using various graph databases. In the beginning, we used the stand-alone edition of AgensGraph. Later, due to its performance limitations, we switched to JanusGraph, a distributed graph database.

How Indexes Work in Nebula Graph - DZone Database

In this article, take a look at how indexes work in Nebula Graph and see core concepts to understand them.

Nebula Graph Source Code Explained via a Sample Graph Query

In this article, take a look at the Nebula Graph source code and see a sample graph query. When I saw the Nebula Graph code repository for the first time, I was so shocked by its huge size that I didn’t know how to dig into the source code.

Nebula Graph 1.0 Benchmark Report Based on the LDBC Dataset

In this article, take a look at key configurations in Nebula graph, an intro into the dataset, and more.

Analyzing Relationships in Game of Thrones With NetworkX, Gephi, and Nebula Graph (Part 1)

In this article, see part one of how to analyze relationships in Game of Thrones with NetworksX, Gephi, and Nebula Graph.