Working With Neo4j Graph Database

We will explore the basic concepts of a Graph Database and then look at some examples of graph data modeling and querying by using Neo4j graph database

Traditionally, a majority of our applications relied on relational database systems (RDBMS) for their data storage needs. But RDBMS systems are not very efficient at handling high volumes of connected data. A graph database is purpose-built to store this category of connected data.

In this article, we will explore the basic concepts of a graph database and then look at some examples of graph data modeling and querying by using the Neo4j graph database.

What is a Graph

We must understand what a graph is before trying to understand a graph database:

A graph is a concept taken from graph theory in Mathematics. It is a data structure composed of a set of nodes (also called vertices) connected by a set of edges. A popular example of a graph is a social network where the members of a network are linked to each other by means of different types of relationships as shown here:

Network With Nodes Example

In this diagram, the members of the network are represented by nodes and the edges are the relations between them.

Each node can represent an entity (a person, place, thing, category, or another piece of data).

This graph data structure allows us to model all kinds of scenarios having connected data from a network path to a network of servers, or anything else defined by relationships. We can break down any structure or concept irrespective of complexity into a set of constituent parts that have some relationship to one another.

What is a Graph Database

We can best understand a Graph Database by thinking in terms of the data model. A traditional data model is composed of entities associated with each other with various types of relationships. We visualize this data model with an Entity Relationship Diagram (ERD).

In contrast, we store a graph data structure in a graph data model consisting of vertices or nodes that represent the entities and edges that represent the relationships between these entities.

Similar to a database management system, we perform create, read, update and delete (CRUD) operations in a graph database while working with connected data. However, since relationships are first-class citizens in graph data stores, we do not have to specify data connections using any implementation-specific technique, like foreign keys.

To get a feel of working with a graph database, let us use Neo4j which is a widely used open-source graph database.

Introducing Neo4j Graph Database

Neo4j is an open-source, NoSQL, native graph database supporting ACID transactions. It supports deployment in an environment with high availability and reliability thus making it suitable to store data in production systems. It comes with a community and enterprise edition.

Neo4j also has a good ecosystem of tools for development and for supporting operational activities. It provides drivers and integrators for a wide array of technology stacks for integrating applications running.

#database #graph #neo4j

What is GEEK

Buddha Community

Working With Neo4j Graph Database
Grace  Lesch

Grace Lesch


Migrating SQL Server graph databases to Neo4j

Migrating SQL Server graphs to Neo4j

Even if I prefer using SSIS for data transfer operations, Neo4j doesn’t have any official or stable (free) SSIS component. While searching, I found a third-party component that is [still in the beta version].

Another approach for migrating SQL Server graphs to Neo4j is to export data into flat files and then [import them into Neo4j].

The third approach is to develop a small application using C## to migrate Nodes and Edges created in SQL Server to a Neo4j database. This approach is explained in detail in this section.

#graph database #database #neo4j #neo4j database

Ruth  Nabimanya

Ruth Nabimanya


Which Database Is Right For You?Graph Database vs. Relational Database

At the very beginning of most development endeavors lies an important question: What database do I choose? There is such an abundance of database technologies at this moment, it’s no wonder many developers don’t have the time or energy to research new ones. If you are one of those developers and you aren’t very familiar with graph databases in general, you’ve come to the right place!

In this article, you will learn about the main differences between a graph database and a relational database, what kind of use-cases are best suited for each database type, and what are their strengths and weaknesses.

How Does a Graph Database Differ from a Relational Database?

The Graph Data Model

The Relational Data Model

When to use a Graph Database?

When not to use a Graph Database

Is a Graph Database Worth it?

#graph-database #relational-database #graph-theory #graph-analysis #data-analytics #networks #data #database

Benchmarking the Mainstream Open Source Distributed Graph Databases

The deep learning and knowledge graph technologies have been developing rapidly in recent years. Compared with the “black box” of deep learning, knowledge graphs are highly interpretable, thus are widely adopted in such scenarios as search recommendations, intelligent customer support, and financial risk management.

Meituan has been digging deep in the connections buried in the huge amount of business data over the past few years and has gradually developed the knowledge graphs in nearly ten areas, including cuisine graphs, tourism graphs, and commodity graphs. The ultimate goal is to enhance the smart local life.

Compared with the traditional RDBMS, graph databases can store and query knowledge graphs more efficiently. It gains obvious performance advantage in multi-hop queries to select graph databases as the storage engine. Currently, there are dozens of graph database solutions out there on the market.

It is imperative for the Meituan team to select a graph database solution that can meet the business requirements and to use the solution as the basis of Meituan’s graph storage and graph learning platform. The team has outlined the basic requirements as below per our business status quo:

  1. It should be an open-source project which is also business-friendly

By having control over the source code, the Meituan team can ensure data security and service availability.

  1. It should support clustering and should be able to scale horizontally in terms of both storage and computation capabilities

The knowledge graph data size in Meituan can reach hundreds of billions of vertices and edges in total and the throughput can reach tens of thousands of QPS. With that being said, the single-node deployment cannot meet Meituan’s storage requirements.

  1. It should work under OLTP scenarios with the capability of multi-hop queries at the millisecond level.

To ensure the best search experience for Meituan users, the team has strictly restricted the timeout value within all chains of paths. Therefore, it is unacceptable to respond to a query at the second level.

  1. It should be able to import data in batch

The knowledge graph data is usually stored in data warehouses like Hive. The graph database should be equipped with the capability to quickly import data from such warehouses to the graph storage to ensure service effectiveness.

The Meituan team has tried the top 30 graph databases on DB-Engines and found that most well-known graph databases only support single-node deployment with their open-source edition, for example, Neo4j, ArangoDB, Virtuoso, TigerGraph, RedisGraph. This means that the storage service cannot scale horizontally and the requirement to store large-scale knowledge graph data cannot be met.

After thorough research and comparison, the team has selected the following graph databases for the final round: Nebula Graph (developed by a startup team who originally came from Alibaba), Dgraph (developed by a startup team who originally came from Google), and HugeGraph (developed by Baidu).

A Summary of The Testing Process

Hardware Configuration

  1. Database instances: Docker containers running on different machines
  2. Single instance resources: 32 Cores, 64 GB Memory, 1 TB SSD (Intel® Xeon® Gold 5218 CPU @ 2.30 GHz)
  3. Number of instances: Three

#database #tutorial #graph database #database performance #nebula graph #dgraph #graph database adoption

Mikel  Okuneva

Mikel Okuneva


Data Migration From JanusGraph to Nebula Graph - Practice at 360 Finance

Speaking of graph data processing, we have had experience in using various graph databases. In the beginning, we used the stand-alone edition of AgensGraph. Later, due to its performance limitations, we switched to JanusGraph, a distributed graph database. I introduced details on how to migrate data in my article “Migrate tens of billions of graph data into JanusGraph (only in Chinese)”. As the data size and the number of business calls grew, a new problem appeared: Each query consumed too much time. In some business scenarios, a single query took up to 10 seconds, and with increase of the data size, a more complicated single query needed two or three seconds. These problems had seriously affected the performance of the entire business process and the development of related businesses.

The architecture design of JanusGraph determines that a single query is time-consuming. The core reason is that its storage depends on the external storage, and JanusGraph cannot control the external storage well. In our production environment, an HBase cluster is used, which makes it impossible for all queries to be pushed down to the storage layer for processing. Instead, data can only be queried from HBase to the JanusGraph Server memory and then filtered accordingly.

#database #tutorial #graph database #database performance #nebula graph #graph database adoption

Edison  Stark

Edison Stark


How Indexes Work in Nebula Graph - DZone Database

Why Indexes Are Needed in a Graph Database

Indexes are an indispensable function in a database system. Graph databases are no exception.

An index is actually a sorted data structure in the database management system. Different database systems adopt different sorting structures.

Popular index types include:

  • B-Tree index
  • B±Tree index
  • B*-Tree index
  • Hash index
  • Bitmap index
  • Inverted index

Each of them uses their own sorting algorithms.

A database index allows efficient data retrieval from databases. Despite of the query performance improvement, there are some disadvantages of indexes:

  • It takes time to create and maintain indexes, which scales with dataset size.
  • Indexes need extra physical storage space.
  • It takes more time to insert, delete, and update data because the index also needs to be maintained synchronously.

Taking the above into consideration, Nebula Graph now supports indexes for more efficient retrieves on properties.

This post gives a detailed introduction to the design and practice of indexes in Nebula Graph.

Core Concepts to Understand Indexes in Nebula Graph

Below is a list of common Nebula Graph index terms we use across the post.

  • Tag: A label associated with a list of properties. Each vertex can associate with multiple tags. Tag is identified with a TagID. You can regard tag as a node table in SQL.
  • Edge: Similar to tag, edge type is a cluster of properties on edges. You can regard edge type as an edge table in SQL.
  • Property: The name-value pairs on tag or edge. Its data type is determined by the tag or edge type.
  • Partition: The minimum logical storage unit of Nebula Graph. A StorageEngine can contain multiple partitions. Partition is divided into leader and follower. We use Raft to guarantee data consistency between leader and follower.
  • Graph space: A physically isolated space for a specific graph. Tags and edge types in one graph are independent with those in another graph. A Nebula Graph cluster can have multiple graph spaces.
  • Index: Index in this post refers specifically to the index of ~~ ~~tag or edge type properties. Its data type depends on tag or edge type.
  • TagIndex: An index created for a tag. You can create multiple indexes for the same tag. Cross-tag composite index is yet to be supported.
  • EdgeIndex: An index created for an edge type. Similarly, you can create multiple indexes for the same edge type. Cross-edge-type composite index is yet to be supported.
  • Scan Policy: The policy to scan indexes. Usually, there are multiple methods to scan indexes to execute one query statement, but the scan policy itself gets to decide which method to use ultimately.
  • Optimizer: Optimize query conditions, such as sorting, splitting, and merging sub-expression nodes of the expression tree of the where clause. It’s used to obtain higher query efficiency.

What’s Required for Indexes to Work in a Graph Database

There are two typical ways to query data in Nebula Graph, or more generally in a graph database:

  1. One is starting from a vertex, retrieving its (N-hop) neighbors along certain edge types.
  2. Another is retrieving vertices or edges which contain specified property values.

In the latter scenario, a high-performance scan is needed to fetch the edges or vertices as well as the property values.

In order to improve the query efficiency of property values, we’ve implemented indexes in Nebula Graph. By sorting the property values of edges or vertices, users can quickly locate a certain property and avoid full scan.

Here’s what we found are required for indexes to work in a graph database:

  • Supporting indexes for properties on tags and edge types.
  • Supporting analysis and generation of index scanning strategy.
  • Supporting index management such as create index, rebuild index, show index, etc.

How Indexes Are Stored in Nebula Graph

Below is a diagram of how indexes are stored in Nebula Graph. Indexes are a part of Nebula Graph’s Storage Service so we place them in the big picture of its storage architecture.

Seen from the above figure, each Storage Server can contain multiple Storage Engines, each Storage Engine can contain multiple Partitions.

Different Partitions are synchronized via Raft protocol. Each Partition contains both data and indexes. The data and indexes of the same vertex or edge will be stored in the same Partition.

#tutorial #graph database #index #database indexes #nebula graph #database