Supercharged Data Science with LynxKite on Neo4j data

If you have a Neo4j database, then you’ve already completed a few important steps in your graph journey: you know how you want to model (some aspects of) your business as a graph, probably you are already using your data for ad hoc graph queries, local investigations, or, maybe even in your operations!

But — just like you wouldn’t use an SQL database directly in table oriented data science — if you want to succeed in a complex, iterative graph data science project you need more than just a graph database.

In a classical, table oriented project you would use something like Pandas or Spark from Python, drop in some ML framework like PyTorch or use an integrated Data Science tool like RapidMiner or Dataiku. But whatever your choice of tool, you would work with snapshots of data and you would end up with a complex workflow of many interdependent operations.

If you have a serious graph project on your hand, however, you should turn to LynxKite! Now it’s made super easy with our new Neo4j connectors.

For the record, if you want to do graph data science but you do not have a graph DB yet, that’s also totally fine. You can use LynxKite directly to turn your traditional data into graphs. But this post is not about that.

Let’s now consider a simple but typical graph data science effort in detail. Take Neo4j’s Northwind dataset. (Just enter :play northwind-graph into your Neo4j browser and follow the instructions to get it.) This is a graph representing customers’ order histories. It contains nodes for customers, orders, products, product categories and suppliers. What we want to achieve is identifying groups of products that are catering to similar “taste”, based on how often they were bought by the same customer.

