Forecasting is done using Xgboost, Catboost, Lightgbm, Prophet. In this blog, the Exploratory Data analysis for M5 competition data is performed using R.
In this blog, the Exploratory Data analysis for M5 competition data is performed using R and sales for 28 days were forecasted using _Xgboost, Catboost, Lightgbm, and Facebook prophet. The best model is chosen by comparing the SMAPE error rate and _One standard error rule.
The Makridakis Competitions (also known as the M Competitions) are series of open competitions organized by teams led by forecasting researcher Spyros Makridakis and intended to evaluate and compare the accuracy of different forecasting methods. he first competition named M-Competition was held way back in 1982 with only 1001 data points, the complexity of model and data scale increased with every successive iteration.
Link to competition:https://www.kaggle.com/c/m5-forecasting-accuracy
In March this year(2020), the fifth iteration named M5 competition was held. This m5 competition aims to forecast daily sales for the next 28 days i.e., till 22nd May 2016, and to make uncertainty estimates for these forecasts. In this blog, I am just going to do forecasting and uncertainty will be performed in my next blog with the best-chosen model.
The dataset provided contains 42,840 hierarchical sales data from Walmart. The dataset covers stores in three US states (California, Texas, and Wisconsin) and includes item level, department, product categories, and store details for 5 years starting from 29th Jan 2011 to 24th April 2016. Also, it has explanatory variables such as price, snap events, day of the week, and special events and festivals.
Figure 1: An overview of how the M5 series data is organized
The data comprises 3049 individual products from 3 categories and 7 departments, sold in 10 stores in 3 states. The hierarchical aggregation captures the combinations of these factors which makes it feasible to perform a bottom-up approach or top-down approach. For instance, we can create 1 time series for all sales or perform for each state separately and so on.
Based on the data given some of the factors that may affect sales are:
Before diving deep into data exploration, A quick overview of population & Median Income for each state:
Population: 39.51 Million
Median Household Annual Income: $75,277
Population: 29 Million
Median Household Annual Income: $59,570
Population 5.822 Million
Median Income: $60,733
The exploratory data analysis was done to test these hypothesis statements.
Let’s start data analysis by knowing which state recorded the highest sales and also the individual department sales in each of these three states.
Become a data analysis expert using the R programming language in this [data science](https://360digitmg.com/usa/data-science-using-python-and-r-programming-in-dallas "data science") certification training in Dallas, TX. You will master data...
Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.
Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.
Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.
You will discover Exploratory Data Analysis (EDA), the techniques and tactics that you can use, and why you should be performing EDA on your next problem.