My colleague and I (the credit has to go to Dimitry Apter who’s done most of the actual tangible work) have recently been commissioned with a relatively simple task to create a multi-filter navigation for an e-commerce site. Since this feature is by no means unique, I was certain that it was going to be a 5 minute search in Google and Bob’s your uncle. I was bitterly disappointed. I was reading about filters aggregation, terms aggregation, nested aggregation, composite aggregation, on and on. It took almost a day to find the answer I really needed. Hopefully now people working on similar tasks will quickly stumble upon this post and find it useful.

The scenario is quite trivial. Suppose we have a shop selling clothes and our clothes have but 5 attributes: category, color, brand, style and size, i.e. 5 facets in Elasticsearch terms.

To make this post comprehensive let’s first synthesise some toy data.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Category': np.random.choice(['Dress', 'Pants'], size=50, p=[0.7, 0.3]), 'Color': None, 'Style': None, 'Brand': None, 'Size': None})
dress_styles = ['Maxi', 'Evening', 'Shift', 'Sheath']
dress_brands = ['Hermes', 'Prada', 'Chanel', 'Fendi', 'Armani']
sizes = ['S', 'M', 'L', 'XL']
pants_styles = ['Culottes', 'Tights', 'Dungarees']
pants_brands = ["Levi's", "Wrangler", "Armani", "Calvin Klein", "Diesel"]
colors = ['Green', 'Black', 'White', 'Red', 'Blue']
size = df[df.Category == 'Dress'].shape[0]
df.loc[df['Category'] == 'Dress', ['Style']] = np.random.choice(dress_styles, size=size).reshape(size,1)
df.loc[df['Category'] == 'Dress', ['Brand']] = np.random.choice(dress_brands, size=size).reshape(size,1)
size = df[df.Category == 'Pants'].shape[0]
df.loc[df['Category'] == 'Pants', ['Style']] = np.random.choice(pants_styles, size=size).reshape(size,1)
df.loc[df['Category'] == 'Pants', ['Brand']] = np.random.choice(pants_brands, size=size).reshape(size,1)
df['Color'] = np.random.choice(colors, size=50)
df['Size'] = np.random.choice(sizes, size=50)
df['id'] = list(range(1, len(df) + 1))

Now let’s insert our data to an Elasticsearch index (I was using ES 7.7)

from elasticsearch import Elasticsearch, helpers
def filterKeys(document, df):
    return {key: document[key] for key in df.columns.values}

def doc_generator(df, index_name):
    df_iter = df.iterrows()
    for index, document in df_iter:
        res = {
            "_index": index_name,
            "_id": f"{document['id']}",
            "_source": filterKeys(document, df)
        }
        yield res
es_client = Elasticsearch('localhost:9200')
index_name = 'faceted_navigation'
es_client.indices.create(index_name, body={"mappings": {"properties": {
    "id": {
        "type": "integer"
    },
    "Category": {
        "type": "keyword"
    },
    "Color": {
        "type": "keyword"
    },
        "Brand": {
        "type": "keyword"
    },
    "Style": {
        "type": "keyword"
    },
        "Size": {
        "type": "keyword"
    }

}}})
helpers.bulk(es_client, doc_generator(df, index_name), request_timeout=120)

Suppose I was searching for a dress. My query would look something like:

{
    "size": 0,
    "query": {
        "match": {
            "Category": "Dress"
        }
    },
    "aggs": {
        "Color": {
            "terms": {
                "field": "Color",
                "size": 10
            }
        },
        "Size": {
            "terms": {
                "field": "Size",
                "size": 10
            }
        },
        "Brand": {
            "terms": {
                "field": "Brand",
                "size": 10
            }
        },
        "Style": {
            "terms": {
                "field": "Style",
                "size": 10
            }
        }
    }
}

#elasticsearch #ecommerce #programming #data-science #faceted-search #data analysis

Faceted navigation for e-commerce with Elasticsearch
2.00 GEEK