“To better understand the marketplace, it is incumbent for organizations to look beyond their own four walls for data sources.”
Douglas Laney (VP, Gartner Research)
There have been several implementations of the popular Walmart Sales Forecast competition to predict their sales.
Screenshot from the Kaggle Competition
However, all of them seem to **attempt to increase accuracy **(reduce error)by focusing on mainly two things:
1) Feature engineering (getting the most out of your features)
2) Model/parameter optimization (choosing best model & best parameters)
Both of the above are very necessary indeed, but there is a third thing that **adds value **in a complementaryway, and it’s wildly underused not only in this use case (which understandably was against the rules of the competition) but in most data science projects:
In this article, we’ll do a simple sales forecast model and then blend external variables (properly done).
The title of this article refers to improving all models, not because of doing something else, but by doing the same thing with more useful data.
So we’ll use the same model and we **won’t do data wrangling **or engineering at any point, so that we can tell apart only the benefit of adding useful features.
Walmart released data containing weekly sales for 99 departments (clothing, electronics, food…) in every **physical store **along with some other added features.
Walmart dataset screenshot
For this, we will create an ML model with ‘_Weekly_Sales’ as target, _and train with the first 70% observations and test on the posterior 30%.
The objective is to **minimize the Prediction error **on future weekly sales.
We’ll add external variables that impact or have a relationship with sales such as **dollar **index, **oil **price and **news **about Walmart.
We won’t use model/parameter optimization nor feature engineering so we can distinguish the benefit from adding the external features.
First, you need to have Python 2 or 3 installed and the following libraries:
$ pip install pandas OpenBlender scikit-learn
Then, open a Python script (preferably Jupyter notebook) and let’s import the needed libraries.
#forecasting #machine-learning #timeseries #demand-forecasting #nlp #deep learning