In Part 1(you can read it here), I discussed the Business Case for Predicting Visitor-to-Customer Conversion for an Online Store and covered Exploratory Data Analysis of the training dataset.
In this part, I will cover Data Preprocessing and the Application of Supervised Learning Algorithms, namely RandomForest and XGBoost to the prepared training dataset.
So without further ado, let’s go to Data Preprocessing!
“What you sow, so you reap”. This proverb, so true for life in general is also very much true for Data Science ! We cannot feed crappy data to our algorithms and expect them to magically give us accurate predictions.
Getting the data ready in a form that can be fed into a learning algorithm is a vital task that a Data Scientist does.
As I mentioned in Part 1, the attributes in this data challenge were discrete and continuous with a widely varying ranges as well as categorical, with widely varying class sizes. Take a look at the short document that I have created to describe the attributes here.
The key elements of the Data Preprocessing Strategy that I used are the following:
The Panda Dataframe is then used for feeding data to the machine learning algorithms.
#digital-marketing #machine-learning #kaggle #analytics #data-science