Working with Neo4j: RFM Analysis using E-Commerce Data

Working with Neo4j: RFM Analysis using E-Commerce Data

Working with Neo4j, I’ll try to explain how to create an example graph from the e-commerce data, using Neo4j and also touch on RFM Analysis.

Creating a graph using e-commerce data and make a RFM analysis

Relational databases are logical way to manage data, but on the other hand, alternative approaches such as graph database can be more useful in many cases. It’s known that huge companies in various industries such as eBay, Airbnb, Cisco and many others use the graph database. At this point, Neo4j shows itself as a graph database platform for managing the data.

In this article, I’ll try to explain how to create an example graph from the e-commerce data, using Neo4j and also touch on RFM Analysis.

The Data

First of all, you can find the e-commerce data I mentioned before here. At first glance, it’s clearly seen that the data consists of transactions. Therefore, the data includes a series of columns such as customer, purchased products, quantity and date of transaction.

It would be a step in the right direction to plan the schema before inserting data to Neo4j. The schema aimed to be builded in present study is as follows,

Image for post

We can start with the customers now. Creating a constraint before creating nodes both prevents duplication and performs better because it uses MERGE locks. You can create the constraint as follows,

CREATE CONSTRAINT ON (customer:Customer) ASSERT customer.customerID IS UNIQUE

Please notice that, having uniqueness for a property value is only useful in the graph if the property exists. Then you can create customer nodes as follows,

:auto 
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://git.io/JkIjD'
AS line
WITH 
  toInteger(line.CustomerID) AS CustomerID, 
  line WHERE NOT line.CustomerID IS null
MERGE(customer:Customer {customerID: CustomerID})
ON CREATE SET customer.country = line.Country

After creating customer nodes, it’ll be even easier to create product and transaction nodes. Likewise, firstly it would be correct to create constraint for product nodes.

CREATE CONSTRAINT ON (product:Product) ASSERT product.stockCode IS UNIQUE

There is an important point in here, when you create a constraint, Neo4j will create an index. Cypher will use that index for lookups just like other indexes. Therefore, there’s no need to create a separate index. In fact, if you try to create a constraint when there’s already an index, you’ll get an error.

After taking into account all of these, you can create product nodes as follows,

:auto 
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://git.io/JkIjD'
AS line
MERGE(product:Product {stockCode: line.StockCode})
ON CREATE SET product.description = line.Description

As you can see above, ON CREATE statement is used when creating nodes. If the node needs to be created, merge a node and set the properties. Similarly, you can also use the statement ON MATCH if the node already exists.

It’ll be nice to create transaction nodes just before start dealing with relationships as follows,

CREATE CONSTRAINT ON (transaction:Transaction) ASSERT transaction.transactionID IS UNIQUE;

:auto 
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://git.io/JkIjD'
AS line
MERGE(transaction:Transaction {transactionID: line.InvoiceNo})
ON CREATE SET transaction.transactionDate = line.InvoiceDate

Looking at the Cypher statements above, you can see that semicolon is used to separate Cypher statements. In general, you don’t need to end a Cypher statement with a semi-colon, but if you want to execute multiple Cypher statements, you must separate them.

The nodes in the graph are ready, but these nodes have no connection with each other. The connections capture the semantic relationships and context of the nodes in the graph. As it’s known, 3 types of nodes are available in the graph: Customer, transaction and product. As I mentioned at the beginning of this section, having relationships between customer-transaction and transaction-product will make this graph much more logical. The customer MADE a transaction and the transaction CONTAINS products. Here is the Cypher statement to building MADE relationships,

:auto 
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://git.io/JkIjD'
AS line
WITH toInteger(line.CustomerID) AS CustomerID, 
  line.InvoiceNo AS InvoiceNo
MATCH (customer:Customer {customerID: CustomerID})
MATCH (transaction:Transaction {transactionID: InvoiceNo})
MERGE (customer)-[:MADE]->(transaction)

Let’s finalize the graph by creating CONTAINS relationships,

:auto 
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://git.io/JkIjD'
AS line
WITH toInteger(line.CustomerID) AS CustomerID, line.InvoiceNo AS InvoiceNo
MATCH (customer:Customer {customerID: CustomerID})
MATCH (transaction:Transaction {transactionID: InvoiceNo})
MERGE (customer)-[:MADE]->(transaction)

You can now check the schema of the graph with the statement CALL db.schema.visualization(). The result is like that,

Image for post

Keep in mind that you can create the graph in a different way. For instance, the transaction could have been a relationship, not a node, and we could call it BOUGHT. As you can imagine, which one you choose depends on your business problem. At this point, you should set the rules and build the structure accordingly.

RFM Analysis

RFM analysis is a behavior-based approach grouping customers into segments. It groups the customers on the basis of their previous purchase transactions. Here are the three dimensions of RFM,

  • Recency: How recently did the customer purchase?
  • Frequency: How often do they purchase?
  • Monetary Value: How much do they spend?

Segmenting customers using RFM analysis is an important point for companies that sell in many industries. Because companies want to know the customers that are valuable to them and to ensure loyalty for all their customers.

After mentioning the dimensions of the RFM and the significance of the customer segmentation, we can get the recency, frequency and monetary value with following python code,

from py2neo import Graph
import pandas as pd

host = 'localhost'
port = 7687
user = ''
password = ''

graph = Graph(
    host=host,
    port=port,
    user=user,
    password=password
)
tx = graph.begin()

query = """
  MATCH (c:Customer)-[r1:MADE]->(t:Transaction)-[r2:CONTAINS]->(:Product)
  WITH SUM(r2.price) AS monetary,
     COUNT(DISTINCT t) AS frequency,
       c.customerID AS customer,
       MIN(
        duration.inDays(
        date(datetime({epochmillis: apoc.date.parse(t.transactionDate, 'ms', 'MM/dd/yyyy')})), 
        date()
      ).days
    ) AS recency
  RETURN customer, recency, frequency, monetary
"""

# create the dataframe
results = tx.run(query).data()
df = pd.DataFrame(results)

## edit the recency value
df['recency'] = df['recency'] - df['recency'].min()

Then, it would be a correct step to define the segments by creating percentiles for dimensions. Please keep in mind that the segmentation here can be taken to a much more advanced level and is often not that simple. Real world problems can be more complex.

## three quantiles to rfm values
df['r_val'] = pd.qcut(df['recency'], q=3, labels=range(3, 0, -1))
df['f_val'] = pd.qcut(df['frequency'], q=3, labels=range(1, 4))
df['m_val'] = pd.qcut(df['monetary'], q=3, labels=range(1, 4))

## create the segment value
df['rfm_val'] = (
    df['r_val'].astype(str) + 
    df['f_val'].astype(str) + 
    df['m_val'].astype(str)
)

## example names for segments
mapping = {
    'Best customers': '333',
    'No purchases recently': '133',
    'Low loyalty': '111',
    'New customers': '311'
}

## print the results
for k, v in mapping.items():
    print(k + ',')
    print(df[df.rfm_val == v].drop('customer', axis=1).describe().T)
    print()

machine-learning neo4j python developer

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Hire Machine Learning Developers in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Hire Python Developers

Are you looking for experienced, reliable, and qualified Python developers? If yes, you have reached the right place. At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, our full-stack Python development services...

Hire Python Developers India

Looking to build robust, scalable, and dynamic responsive websites and applications in Python? At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, we constantly endeavor to give you exactly what you need. If you need to...

Python Tutorial - Learn Python for Machine Learning and Web Development

Learn Python for Machine Learning and Web Development. Can Python be used for machine learning? Python is widely considered as the preferred language for teaching and learning ML (Machine Learning). Can I use Python for web development? Python can be used to build server-side web applications. Why Python is suitable for machine learning? How Python is used in AI? What language is best for machine learning?

How To Plot A Decision Boundary For Machine Learning Algorithms in Python

How To Plot A Decision Boundary For Machine Learning Algorithms in Python, you will discover how to plot a decision surface for a classification machine learning algorithm.