Characteristic Based Similarity for New Products Forecasting

“How much I’m gonna sell of it?” is the question that every retailer has in mind when they are thinking about adding a new material into their stores and e-commerce.

We usually have a lot of techniques to help us with forecasting and prevision, but when we talk about new products we face a little big problem that is the fact if we don’t have sales data for the new product to use in regression or in a time series algorithm.

So… how can we forecast it?

Approach

One way to work around this lack of data is finding a product that is similar to the new one and copying a percentage of its historical sales data, and with it, you can now apply your preferred forecasting technique.

Basically, you can compare the products in several different ways and besides that each business will have to define what are the ideal features to use in this analysis.

After having the features defined, which is the most difficult and important step of the process, you will follow a simple step-by-step to achieve your goal:

1. Apply one-hot encoding to categorical data

Applying one hot encoding we go from comparing distances between the categories names/texts, to comparing whether or not the product is in the same category as the old one.

2. Apply a scaler technique to numerical data

When we talk about distances it’s always something like A-B=C, but now we are discussing about the distance of several characteristics and the result sounds like the sum of all Cs created by all As minus all Bs.

There would be no problem with that, only if the range of values from columns is all the same, and I’m quite sure that this is very, very rare to occur.

Let’s say that we have 3 columns, one has a range of values from 200 to 400, another column is something like 15k to 78k, and a third one is from 1 to 10.

#machine-learning #artificial-intelligence #retail #data-science #python

Approach

towardsdatascience.com

Characteristic Based Similarity for New Products Forecasting