M5 Forecasting- Accuracy

M5 Forecasting- Accuracy

Forecasting is done using Xgboost, Catboost, Lightgbm, Prophet. In this blog, the Exploratory Data analysis for M5 competition data is performed using R.


In this blog, the Exploratory Data analysis for M5 competition data is performed using R and sales for 28 days were forecasted using _Xgboost, Catboost, Lightgbm, and Facebook prophet. The best model is chosen by comparing the SMAPE error rate and _One standard error rule.

Background of Competition:

The Makridakis Competitions (also known as the M Competitions) are series of open competitions organized by teams led by forecasting researcher Spyros Makridakis and intended to evaluate and compare the accuracy of different forecasting methods. he first competition named M-Competition was held way back in 1982 with only 1001 data points, the complexity of model and data scale increased with every successive iteration.

Link to competition:https://www.kaggle.com/c/m5-forecasting-accuracy


In March this year(2020), the fifth iteration named M5 competition was held. This m5 competition aims to forecast daily sales for the next 28 days i.e., till 22nd May 2016, and to make uncertainty estimates for these forecasts. In this blog, I am just going to do forecasting and uncertainty will be performed in my next blog with the best-chosen model.


The dataset provided contains 42,840 hierarchical sales data from Walmart. The dataset covers stores in three US states (California, Texas, and Wisconsin) and includes item level, department, product categories, and store details for 5 years starting from 29th Jan 2011 to 24th April 2016. Also, it has explanatory variables such as price, snap events, day of the week, and special events and festivals.

Image for post

Figure 1: An overview of how the M5 series data is organized

The data comprises 3049 individual products from 3 categories and 7 departments, sold in 10 stores in 3 states. The hierarchical aggregation captures the combinations of these factors which makes it feasible to perform a bottom-up approach or top-down approach. For instance, we can create 1 time series for all sales or perform for each state separately and so on.


Based on the data given some of the factors that may affect sales are:

  1. Day- Customers shopping time and spending mostly depends on the weekend. Many customers may like to shop only at weekends.
  2. *Special Events/Holidays: *Depending on the events and holidays customers purchasing behavior may change. For holidays like Easter, food sales may go up and for sporting events like Superbowl finals Household item sales may go up.
  3. *Product Price: *The sales are affected the most by the product price. Most customers will check the price tag before making the final purchase.
  4. *Product Category: *The type of product greatly affects sales. For instance, products in the household like TV will have fewer sales when compared with sales of food products.
  5. *Location: *The location also plays an important role in sales. In states like California, the customers might buy products they want irrespective of price, and customers in another region may be price sensitive.

Before diving deep into data exploration, A quick overview of population & Median Income for each state:


Population: 39.51 Million

Median Household Annual Income: $75,277


Population: 29 Million

Median Household Annual Income: $59,570


Population 5.822 Million

Median Income: $60,733

The exploratory data analysis was done to test these hypothesis statements.

Exploratory Data Analysis

Let’s start data analysis by knowing which state recorded the highest sales and also the individual department sales in each of these three states.

timeseries-forecasting programming m5-forecasting data-science data analysis

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science Course in Dallas

Become a data analysis expert using the R programming language in this [data science](https://360digitmg.com/usa/data-science-using-python-and-r-programming-in-dallas "data science") certification training in Dallas, TX. You will master data...

What Are The Advantages and Disadvantages of Data Science?

Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

Exploratory Data Analysis is a significant part of Data Science

Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.

Exploratory Data Analysis is a significant part of Data Science

You will discover Exploratory Data Analysis (EDA), the techniques and tactics that you can use, and why you should be performing EDA on your next problem.