Benchmarking the Mainstream Open Source Distributed Graph Databases

The deep learning and knowledge graph technologies have been developing rapidly in recent years. Compared with the “black box” of deep learning, knowledge graphs are highly interpretable, thus are widely adopted in such scenarios as search recommendations, intelligent customer support, and financial risk management.

Meituan has been digging deep in the connections buried in the huge amount of business data over the past few years and has gradually developed the knowledge graphs in nearly ten areas, including cuisine graphs, tourism graphs, and commodity graphs. The ultimate goal is to enhance the smart local life.

Compared with the traditional RDBMS, graph databases can store and query knowledge graphs more efficiently. It gains obvious performance advantage in multi-hop queries to select graph databases as the storage engine. Currently, there are dozens of graph database solutions out there on the market.

It is imperative for the Meituan team to select a graph database solution that can meet the business requirements and to use the solution as the basis of Meituan’s graph storage and graph learning platform. The team has outlined the basic requirements as below per our business status quo:

  1. It should be an open-source project which is also business-friendly

By having control over the source code, the Meituan team can ensure data security and service availability.

  1. It should support clustering and should be able to scale horizontally in terms of both storage and computation capabilities

The knowledge graph data size in Meituan can reach hundreds of billions of vertices and edges in total and the throughput can reach tens of thousands of QPS. With that being said, the single-node deployment cannot meet Meituan’s storage requirements.

  1. It should work under OLTP scenarios with the capability of multi-hop queries at the millisecond level.

To ensure the best search experience for Meituan users, the team has strictly restricted the timeout value within all chains of paths. Therefore, it is unacceptable to respond to a query at the second level.

  1. It should be able to import data in batch

The knowledge graph data is usually stored in data warehouses like Hive. The graph database should be equipped with the capability to quickly import data from such warehouses to the graph storage to ensure service effectiveness.

The Meituan team has tried the top 30 graph databases on DB-Engines and found that most well-known graph databases only support single-node deployment with their open-source edition, for example, Neo4j, ArangoDB, Virtuoso, TigerGraph, RedisGraph. This means that the storage service cannot scale horizontally and the requirement to store large-scale knowledge graph data cannot be met.

After thorough research and comparison, the team has selected the following graph databases for the final round: Nebula Graph (developed by a startup team who originally came from Alibaba), Dgraph (developed by a startup team who originally came from Google), and HugeGraph (developed by Baidu).

A Summary of The Testing Process

Hardware Configuration

  1. Database instances: Docker containers running on different machines
  2. Single instance resources: 32 Cores, 64 GB Memory, 1 TB SSD (Intel® Xeon® Gold 5218 CPU @ 2.30 GHz)
  3. Number of instances: Three

#database #tutorial #graph database #database performance #nebula graph #dgraph #graph database adoption

What is GEEK

Buddha Community

Benchmarking the Mainstream Open Source Distributed Graph Databases

Benchmarking the Mainstream Open Source Distributed Graph Databases

The deep learning and knowledge graph technologies have been developing rapidly in recent years. Compared with the “black box” of deep learning, knowledge graphs are highly interpretable, thus are widely adopted in such scenarios as search recommendations, intelligent customer support, and financial risk management.

Meituan has been digging deep in the connections buried in the huge amount of business data over the past few years and has gradually developed the knowledge graphs in nearly ten areas, including cuisine graphs, tourism graphs, and commodity graphs. The ultimate goal is to enhance the smart local life.

Compared with the traditional RDBMS, graph databases can store and query knowledge graphs more efficiently. It gains obvious performance advantage in multi-hop queries to select graph databases as the storage engine. Currently, there are dozens of graph database solutions out there on the market.

It is imperative for the Meituan team to select a graph database solution that can meet the business requirements and to use the solution as the basis of Meituan’s graph storage and graph learning platform. The team has outlined the basic requirements as below per our business status quo:

  1. It should be an open-source project which is also business-friendly

By having control over the source code, the Meituan team can ensure data security and service availability.

  1. It should support clustering and should be able to scale horizontally in terms of both storage and computation capabilities

The knowledge graph data size in Meituan can reach hundreds of billions of vertices and edges in total and the throughput can reach tens of thousands of QPS. With that being said, the single-node deployment cannot meet Meituan’s storage requirements.

  1. It should work under OLTP scenarios with the capability of multi-hop queries at the millisecond level.

To ensure the best search experience for Meituan users, the team has strictly restricted the timeout value within all chains of paths. Therefore, it is unacceptable to respond to a query at the second level.

  1. It should be able to import data in batch

The knowledge graph data is usually stored in data warehouses like Hive. The graph database should be equipped with the capability to quickly import data from such warehouses to the graph storage to ensure service effectiveness.

The Meituan team has tried the top 30 graph databases on DB-Engines and found that most well-known graph databases only support single-node deployment with their open-source edition, for example, Neo4j, ArangoDB, Virtuoso, TigerGraph, RedisGraph. This means that the storage service cannot scale horizontally and the requirement to store large-scale knowledge graph data cannot be met.

After thorough research and comparison, the team has selected the following graph databases for the final round: Nebula Graph (developed by a startup team who originally came from Alibaba), Dgraph (developed by a startup team who originally came from Google), and HugeGraph (developed by Baidu).

A Summary of The Testing Process

Hardware Configuration

  1. Database instances: Docker containers running on different machines
  2. Single instance resources: 32 Cores, 64 GB Memory, 1 TB SSD (Intel® Xeon® Gold 5218 CPU @ 2.30 GHz)
  3. Number of instances: Three

#database #tutorial #graph database #database performance #nebula graph #dgraph #graph database adoption

Ruth  Nabimanya

Ruth Nabimanya

1620663480

Which Database Is Right For You?Graph Database vs. Relational Database

At the very beginning of most development endeavors lies an important question: What database do I choose? There is such an abundance of database technologies at this moment, it’s no wonder many developers don’t have the time or energy to research new ones. If you are one of those developers and you aren’t very familiar with graph databases in general, you’ve come to the right place!

In this article, you will learn about the main differences between a graph database and a relational database, what kind of use-cases are best suited for each database type, and what are their strengths and weaknesses.

How Does a Graph Database Differ from a Relational Database?

The Graph Data Model

The Relational Data Model

When to use a Graph Database?

When not to use a Graph Database

Is a Graph Database Worth it?

#graph-database #relational-database #graph-theory #graph-analysis #data-analytics #networks #data #database

Tyrique  Littel

Tyrique Littel

1598461200

An Open-Source Book About the Open Source World

Open source today is a word that often include a lot of things, such as open knowledge (Wikimedia projects), open hardware (Arduino, Raspberry Pi), open formats (ODT/ODS/ODP) and so on.

It is a world of opportunities that can be difficult for newcomers but also for intermediates. This article will help you discover how to approach specific roles, activities or projects/communities in the best way.

Everything Started with “Coaching for OpenSource Communities 2.0”

I decided to write a book in my personal style about my experience in the last 7 to 8 years in open source. I was surprised when I reached 100 pages about various different topics.

My idea was to write something that I would like to read, so nothing that is boring or complicated, but full of real facts.

The second goal was to include my experience but also my philosophy on contributing and how I contribute daily.

Thirdly, I wanted to give a lot of hints and resources and an overall view of this open source world.

Basically, I wanted to write something different from self-help or coaching books that includes just a list of suggestions and best practices. Instead, I take real examples from real life about the OSS world.

As a contributor and developer, I prefer to have real cases to study, because best practices are useful, but we need to learn from others and this world is full of good and bad cases to discover.

In 2019, I started writing a book after Fosdem 2019 and after 2 years inside the Mozilla Reps Council. In that Fosdem edition, I had a talk “Coaching for Open Source Communities 2.0” and after the feedback at the conference and my thoughts in various roles, activities, and projects, it was time to write something.

At the end it wasn’t a manual but a book that included my experience, learnings, best practices and so on in Localization, Development, Project Maintainer, Sysadmin, Community Management, Mentor, Speaker and so on. It contains the following sections:

  • Biography - This choice isn’t for self promotion but just to understand my point of view and my story that can be inspiring for others
  • Philosophy - Not the usual description of Open Source or the 4 freedoms, but just what Open Source means and how you can help
  • How to live inside the Open Source - A discovery about communications and tools, understanding the various kind of people and the best way to talk with your community
  • How to choose a project - Starting with some questions to yourself and how to involve more people in your project
  • The activity - Open Source is based on tasks that can be divided in 2 levels: Support, Testing, Marketing, Development etc
  • How to use your time - We are busy, we have a life, a job and a family but Open Source can be time-consuming
  • Why document is important - How writing documentation can be healthy for your community and the project’s future and brand

There are also three appendices that are manuals which I wrote throughout the years and gathered and improved for this book. They are about: community management, public speaking, and mentoring.

The book ends with my point of view about the future and what we have to do to change opinions about those topics.

I wrote this book and published in October 2019, but it was only possible with the help of reviews and localizers that improved and contributed. Yes, because this book is open source and free for everyone.

I picked the GPL license because this license changed the world and my life in the best way. Using this license is just a tribute. This decision usually is not clear because after all this is a book and there are better licenses like Creative Commons.

#open-source #contributing-to-open-source #programming #software-development #development #coding #books #open-source-software

Ray  Patel

Ray Patel

1623348300

Top 8 Java Open Source Projects You Should Get Your Hands-on [2021]

Learning about Java is no easy feat. It’s a prevalent and in-demand programming language with applications in numerous sectors. We all know that if you want to learn a new skill, the best way to do so is through using it. That’s why we recommend working on projects.

So if you’re a Java student, then you’ve come to the right place as this article will help you learn about the most popular Java open source projects. This way, you’d have a firm grasp of industry trends and the programming language’s applications.

However, before we discuss its various projects, it’s crucial to examine the place where you can get those projects – GitHub. Let’s begin.

#full stack development #java open source projects #java projects #open source projects #top 8 java open source projects #java open source projects

Mikel  Okuneva

Mikel Okuneva

1599897600

Data Migration From JanusGraph to Nebula Graph - Practice at 360 Finance

Speaking of graph data processing, we have had experience in using various graph databases. In the beginning, we used the stand-alone edition of AgensGraph. Later, due to its performance limitations, we switched to JanusGraph, a distributed graph database. I introduced details on how to migrate data in my article “Migrate tens of billions of graph data into JanusGraph (only in Chinese)”. As the data size and the number of business calls grew, a new problem appeared: Each query consumed too much time. In some business scenarios, a single query took up to 10 seconds, and with increase of the data size, a more complicated single query needed two or three seconds. These problems had seriously affected the performance of the entire business process and the development of related businesses.

The architecture design of JanusGraph determines that a single query is time-consuming. The core reason is that its storage depends on the external storage, and JanusGraph cannot control the external storage well. In our production environment, an HBase cluster is used, which makes it impossible for all queries to be pushed down to the storage layer for processing. Instead, data can only be queried from HBase to the JanusGraph Server memory and then filtered accordingly.

#database #tutorial #graph database #database performance #nebula graph #graph database adoption