How Nebula Graph Stores a One Trillion Connections Social Network

WeChat is one of the social network apps in the world that deals with large scale heterogeneous graphs. The dataset to be processed has:

One trillion edges/connections
A total dataset of 150TB
An hourly update of 100 billion connections,

And it is a huge challenge. The team at WeChat encountered problems when using Nebula Graph, an open source distributed graph database.

However, through deep customization capabilities in the database, the team has realized some useful on-demand features. They include big data storage, data import for large data sets with a fast performance, version control, rollback at the second level, and access to the database at millisecond level.

The Challenges Facing Large Internet Companies

Most well-known graph databases are not capable of dealing with truly big data. For example, the community version of Neo4j provides single-host service and is widely adopted in the knowledge graph area. However, when it comes to a very large data set this solution misses the mark. And large data sets are increasingly common in today’s business world.

Plus, there are issues like data consistency and disaster recovery to consider if you choose a multi-copy implementation. Janus Graph has solved the big data storage problem by using external metadata management, kv storage and indexes. Yet the performance has been widely criticized. As a result, most graph database solutions that the WeChat team evaluated are many times better than Janus Graph in terms of performance.

Some Internet companies build their own databases. These self-developed solutions are catering to their own business requirements, rather than for general graph scenarios. So, they support only a limited proportion of query syntaxes.

GeaBase From Ant Financial

GeaBase is another option, mainly used in the finance industry. It features a self-developed query language, pushdown computation and millisecond latency. The main scenarios for its usage include risk management in financial organizations. To this end, it supports a transaction network with trillions of edges/relationships, storing real-time transaction data, real-time fraud detection.

It is also useful for recommendation engines. This includes applications like stocks and securities recommendations. Its Ant Forest features the capability to store trillions of nodes, strong data consistency, and low latency querying. It also has a GNN feature for Dynamic Graph CNN, for online inference based on dynamic graphs.

iGraph From Alibaba

There is also iGraph, a graph indexing and query system. It stores user behavior information and serves as one of the four backbone middle platforms in Alibaba. iGraph has adopted Gremlin as its graph query language for real-time queries of e-commerce relationships.

ByteGraph From ByteDance (a.k.a TikTok)

By adding a cache layer to the kv layer, ByteGraph splits the relationships into B+ trees for efficient access to edges and data sampling. The structure is like the TAO of Facebook.

Architecture of the WeChat Big Data Solution

The WeChat team has come up with the following architecture to solve the big data storage and processing problem.

Why Nebula Graph?

As seen in the architecture above, a graph database is the main component of the solution. WeChat ended up selecting Nebula Graph as the starting point of its journey in exploring graph databases.

WeChat found Nebula Graph had the most potential for handling huge dataset storage needs based on the capability of dataset partitioning and an independent relationship storage. It also had pushdown computation and MPP optimization based on the strong consistency storage engine. Finally, the team had extensive experience in the graph database field and a proven model for abstraction for big data.

Problems in Practice Nebula Graph

Insufficient Memory

The WeChat team encountered memory issues. At its essence, it was a problem of performance versus resources. Memory occupation is an un-neglectable issue in an application dealing with large scale datasets. There are a couple of components in RocksDB that contribute to memory usage. There are Block cache, Indexes and bloom filters. There are also Memtables and Blocks pinned by iterators. So, the WeChat team moved to optimize memory utilization. It began with block cache optimization. To do this, it adopted a global LRU cache to control the cache occupation of all RocksDB instances in a machine.

Then the team did a bloom filter optimization. An edge is designed as a key-value pair and stored in RocksDB. If all keys are stored in a bloom filter and each key occupies 10bit, then the memory required by the entire filter will exceed the machine memory by a large margin.

The team observed that most of the time the requests are to acquire a list of edges for a specific node. Therefore, the team adopted a prefix bloom filter. Another optimization was made to create indexes for properties on vertices, which enables acceleration for most requests. Finally, the memory occupation of a single-host filter is at the gigabyte level without sacrificing the speed of most requests.

#database #graph database #case study #use cases #wechat #nebula graph