We will explore the basic concepts of a Graph Database and then look at some examples of graph data modeling and querying by using Neo4j graph database
Traditionally, a majority of our applications relied on relational database systems (RDBMS) for their data storage needs. But RDBMS systems are not very efficient at handling high volumes of connected data. A graph database is purpose-built to store this category of connected data.
In this article, we will explore the basic concepts of a graph database and then look at some examples of graph data modeling and querying by using the Neo4j graph database.
We must understand what a graph is before trying to understand a graph database:
A graph is a concept taken from graph theory in Mathematics. It is a data structure composed of a set of nodes (also called vertices) connected by a set of edges. A popular example of a graph is a social network where the members of a network are linked to each other by means of different types of relationships as shown here:
In this diagram, the members of the network are represented by nodes and the edges are the relations between them.
Each node can represent an entity (a person, place, thing, category, or another piece of data).
This graph data structure allows us to model all kinds of scenarios having connected data from a network path to a network of servers, or anything else defined by relationships. We can break down any structure or concept irrespective of complexity into a set of constituent parts that have some relationship to one another.
We can best understand a Graph Database by thinking in terms of the data model. A traditional data model is composed of entities associated with each other with various types of relationships. We visualize this data model with an Entity Relationship Diagram (ERD).
In contrast, we store a graph data structure in a graph data model consisting of vertices or nodes that represent the entities and edges that represent the relationships between these entities.
Similar to a database management system, we perform create, read, update and delete (CRUD) operations in a graph database while working with connected data. However, since relationships are first-class citizens in graph data stores, we do not have to specify data connections using any implementation-specific technique, like foreign keys.
To get a feel of working with a graph database, let us use Neo4j which is a widely used open-source graph database.
Neo4j is an open-source, NoSQL, native graph database supporting ACID transactions. It supports deployment in an environment with high availability and reliability thus making it suitable to store data in production systems. It comes with a community and enterprise edition.
Neo4j also has a good ecosystem of tools for development and for supporting operational activities. It provides drivers and integrators for a wide array of technology stacks for integrating applications running.
#database #graph #neo4j
Even if I prefer using SSIS for data transfer operations, Neo4j doesn’t have any official or stable (free) SSIS component. While searching, I found a third-party component that is [still in the beta version].
Another approach for migrating SQL Server graphs to Neo4j is to export data into flat files and then [import them into Neo4j].
The third approach is to develop a small application using C## to migrate Nodes and Edges created in SQL Server to a Neo4j database. This approach is explained in detail in this section.
#graph database #database #neo4j #neo4j database
At the very beginning of most development endeavors lies an important question: What database do I choose? There is such an abundance of database technologies at this moment, it’s no wonder many developers don’t have the time or energy to research new ones. If you are one of those developers and you aren’t very familiar with graph databases in general, you’ve come to the right place!
In this article, you will learn about the main differences between a graph database and a relational database, what kind of use-cases are best suited for each database type, and what are their strengths and weaknesses.
#graph-database #relational-database #graph-theory #graph-analysis #data-analytics #networks #data #database
The deep learning and knowledge graph technologies have been developing rapidly in recent years. Compared with the “black box” of deep learning, knowledge graphs are highly interpretable, thus are widely adopted in such scenarios as search recommendations, intelligent customer support, and financial risk management.
Meituan has been digging deep in the connections buried in the huge amount of business data over the past few years and has gradually developed the knowledge graphs in nearly ten areas, including cuisine graphs, tourism graphs, and commodity graphs. The ultimate goal is to enhance the smart local life.
Compared with the traditional RDBMS, graph databases can store and query knowledge graphs more efficiently. It gains obvious performance advantage in multi-hop queries to select graph databases as the storage engine. Currently, there are dozens of graph database solutions out there on the market.
It is imperative for the Meituan team to select a graph database solution that can meet the business requirements and to use the solution as the basis of Meituan’s graph storage and graph learning platform. The team has outlined the basic requirements as below per our business status quo:
By having control over the source code, the Meituan team can ensure data security and service availability.
The knowledge graph data size in Meituan can reach hundreds of billions of vertices and edges in total and the throughput can reach tens of thousands of QPS. With that being said, the single-node deployment cannot meet Meituan’s storage requirements.
To ensure the best search experience for Meituan users, the team has strictly restricted the timeout value within all chains of paths. Therefore, it is unacceptable to respond to a query at the second level.
The knowledge graph data is usually stored in data warehouses like Hive. The graph database should be equipped with the capability to quickly import data from such warehouses to the graph storage to ensure service effectiveness.
The Meituan team has tried the top 30 graph databases on DB-Engines and found that most well-known graph databases only support single-node deployment with their open-source edition, for example, Neo4j, ArangoDB, Virtuoso, TigerGraph, RedisGraph. This means that the storage service cannot scale horizontally and the requirement to store large-scale knowledge graph data cannot be met.
After thorough research and comparison, the team has selected the following graph databases for the final round: Nebula Graph (developed by a startup team who originally came from Alibaba), Dgraph (developed by a startup team who originally came from Google), and HugeGraph (developed by Baidu).
#database #tutorial #graph database #database performance #nebula graph #dgraph #graph database adoption
Speaking of graph data processing, we have had experience in using various graph databases. In the beginning, we used the stand-alone edition of AgensGraph. Later, due to its performance limitations, we switched to JanusGraph, a distributed graph database. I introduced details on how to migrate data in my article “Migrate tens of billions of graph data into JanusGraph (only in Chinese)”. As the data size and the number of business calls grew, a new problem appeared: Each query consumed too much time. In some business scenarios, a single query took up to 10 seconds, and with increase of the data size, a more complicated single query needed two or three seconds. These problems had seriously affected the performance of the entire business process and the development of related businesses.
The architecture design of JanusGraph determines that a single query is time-consuming. The core reason is that its storage depends on the external storage, and JanusGraph cannot control the external storage well. In our production environment, an HBase cluster is used, which makes it impossible for all queries to be pushed down to the storage layer for processing. Instead, data can only be queried from HBase to the JanusGraph Server memory and then filtered accordingly.
#database #tutorial #graph database #database performance #nebula graph #graph database adoption
Indexes are an indispensable function in a database system. Graph databases are no exception.
An index is actually a sorted data structure in the database management system. Different database systems adopt different sorting structures.
Popular index types include:
Each of them uses their own sorting algorithms.
A database index allows efficient data retrieval from databases. Despite of the query performance improvement, there are some disadvantages of indexes:
Taking the above into consideration, Nebula Graph now supports indexes for more efficient retrieves on properties.
This post gives a detailed introduction to the design and practice of indexes in Nebula Graph.
Below is a list of common Nebula Graph index terms we use across the post.
There are two typical ways to query data in Nebula Graph, or more generally in a graph database:
In the latter scenario, a high-performance scan is needed to fetch the edges or vertices as well as the property values.
In order to improve the query efficiency of property values, we’ve implemented indexes in Nebula Graph. By sorting the property values of edges or vertices, users can quickly locate a certain property and avoid full scan.
Here’s what we found are required for indexes to work in a graph database:
Below is a diagram of how indexes are stored in Nebula Graph. Indexes are a part of Nebula Graph’s Storage Service so we place them in the big picture of its storage architecture.
Seen from the above figure, each Storage Server can contain multiple Storage Engines, each Storage Engine can contain multiple Partitions.
Different Partitions are synchronized via Raft protocol. Each Partition contains both data and indexes. The data and indexes of the same vertex or edge will be stored in the same Partition.
#tutorial #graph database #index #database indexes #nebula graph #database