My colleague and I (the credit has to go to Dimitry Apter who’s done most of the actual tangible work) have recently been commissioned with a relatively simple task to create a multi-filter navigation for an e-commerce site. Since this feature is by no means unique, I was certain that it was going to be a 5 minute search in Google and Bob’s your uncle. I was bitterly disappointed. I was reading about filters aggregation, terms aggregation, nested aggregation, composite aggregation, on and on. It took almost a day to find the answer I really needed. Hopefully now people working on similar tasks will quickly stumble upon this post and find it useful.
The scenario is quite trivial. Suppose we have a shop selling clothes and our clothes have but 5 attributes: category, color, brand, style and size, i.e. 5 facets in Elasticsearch terms.
To make this post comprehensive let’s first synthesise some toy data.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Category': np.random.choice(['Dress', 'Pants'], size=50, p=[0.7, 0.3]), 'Color': None, 'Style': None, 'Brand': None, 'Size': None})
dress_styles = ['Maxi', 'Evening', 'Shift', 'Sheath']
dress_brands = ['Hermes', 'Prada', 'Chanel', 'Fendi', 'Armani']
sizes = ['S', 'M', 'L', 'XL']
pants_styles = ['Culottes', 'Tights', 'Dungarees']
pants_brands = ["Levi's", "Wrangler", "Armani", "Calvin Klein", "Diesel"]
colors = ['Green', 'Black', 'White', 'Red', 'Blue']
size = df[df.Category == 'Dress'].shape[0]
df.loc[df['Category'] == 'Dress', ['Style']] = np.random.choice(dress_styles, size=size).reshape(size,1)
df.loc[df['Category'] == 'Dress', ['Brand']] = np.random.choice(dress_brands, size=size).reshape(size,1)
size = df[df.Category == 'Pants'].shape[0]
df.loc[df['Category'] == 'Pants', ['Style']] = np.random.choice(pants_styles, size=size).reshape(size,1)
df.loc[df['Category'] == 'Pants', ['Brand']] = np.random.choice(pants_brands, size=size).reshape(size,1)
df['Color'] = np.random.choice(colors, size=50)
df['Size'] = np.random.choice(sizes, size=50)
df['id'] = list(range(1, len(df) + 1))
Now let’s insert our data to an Elasticsearch index (I was using ES 7.7)
from elasticsearch import Elasticsearch, helpers
def filterKeys(document, df):
return {key: document[key] for key in df.columns.values}
def doc_generator(df, index_name):
df_iter = df.iterrows()
for index, document in df_iter:
res = {
"_index": index_name,
"_id": f"{document['id']}",
"_source": filterKeys(document, df)
}
yield res
es_client = Elasticsearch('localhost:9200')
index_name = 'faceted_navigation'
es_client.indices.create(index_name, body={"mappings": {"properties": {
"id": {
"type": "integer"
},
"Category": {
"type": "keyword"
},
"Color": {
"type": "keyword"
},
"Brand": {
"type": "keyword"
},
"Style": {
"type": "keyword"
},
"Size": {
"type": "keyword"
}
}}})
helpers.bulk(es_client, doc_generator(df, index_name), request_timeout=120)
Suppose I was searching for a dress. My query would look something like:
{
"size": 0,
"query": {
"match": {
"Category": "Dress"
}
},
"aggs": {
"Color": {
"terms": {
"field": "Color",
"size": 10
}
},
"Size": {
"terms": {
"field": "Size",
"size": 10
}
},
"Brand": {
"terms": {
"field": "Brand",
"size": 10
}
},
"Style": {
"terms": {
"field": "Style",
"size": 10
}
}
}
}
#elasticsearch #ecommerce #programming #data-science #faceted-search #data analysis