“To better understand the marketplace, it is incumbent for organizations to look beyond their own four walls for data sources.”

Douglas Laney (VP, Gartner Research)

Intro

There have been several implementations of the popular Walmart Sales Forecast competition to predict their sales.

Image for post

Screenshot from the Kaggle Competition

However, all of them seem to **attempt to increase accuracy **(reduce error)by focusing on mainly two things:

1) Feature engineering (getting the most out of your features)

2) Model/parameter optimization (choosing best model & best parameters)

Both of the above are very necessary indeed, but there is a third thing that **adds value **in a complementaryway, and it’s wildly underused not only in this use case (which understandably was against the rules of the competition) but in most data science projects:

  • Combining external information.

In this article, we’ll do a simple sales forecast model and then blend external variables (properly done).

The title of this article refers to improving all models, not because of doing something else, but by doing the same thing with more useful data.

So we’ll use the same model and we **won’t do data wrangling **or engineering at any point, so that we can tell apart only the benefit of adding useful features.

What we’ll do

  • Step 1: Define and understand Target
  • Step 2: Make a Simple Forecast Model
  • Step 3: Add Financial Indicators and News
  • Step 4: Test the Models
  • Step 5: Measure Results

Step 1. Define and understand Target

Walmart released data containing weekly sales for 99 departments (clothing, electronics, food…) in every **physical store **along with some other added features.

Image for post

Walmart dataset screenshot

For this, we will create an ML model with ‘_Weekly_Sales’ as target, _and train with the first 70% observations and test on the posterior 30%.

The objective is to **minimize the Prediction error **on future weekly sales.

Image for post

We’ll add external variables that impact or have a relationship with sales such as **dollar **index, **oil **price and **news **about Walmart.

We won’t use model/parameter optimization nor feature engineering so we can distinguish the benefit from adding the external features.

Step 2. Make a Simple Forecast Model

First, you need to have Python 2 or 3 installed and the following libraries:

$ pip install pandas OpenBlender scikit-learn

Then, open a Python script (preferably Jupyter notebook) and let’s import the needed libraries.

#forecasting #machine-learning #timeseries #demand-forecasting #nlp #deep learning

Immensely Improving every ‘Walmart Sales’ Demand Forecasting Model
2.30 GEEK