Many times i have entered Kaggle looking for solutions or different datasets. I have taken different machine learning courses and all of them, at one point or another, use a dataset from Kaggle. And it makes sense, since all datasets are well described, divided into training and testing and with many features for you to explore. So i decided to jump into Kaggle and try my first competition, and the best starting point is the Titanic dataset, that one is the getting started in Kaggle. For those who don’t know, RMS Titanic was a British passenger liner operated by the White Star Line that sank in the North Atlantic Ocean in the early morning hours of 15 April 1912, yo can read more in Wikipedia, also there is a beautiful movie called Titanic.

The idea is to use the Titanic passenger data (name, age, price of ticket, etc.) to predict who will survive and who will die, kind of creepy but is a valid approach. So let’s start by loading the dataset. In my case i download it as a zip file from Kaggle.

We will use Python and Jupyter Notebook. Let’s start with our imports and extracting the .zip file. I’m setting a Seaborn style that i like.

import of libraries

Image for post

Head view of Data frame

We have our testing and training data loaded, the training dataset contains 891 training examples and 12 features including the label, and the testing data set contains 418 rows and 11 features, no label. Next is the description of the features.

  • Passenger ID to identify the passenger, numerical feature (Passenger ID/Ticket Number).
  • Survived is our label, as we can see is a binary feature, 1 if survived and 0 otherwise.

#titanic #random-forest #classification #python #machine-learning

Random Forest on Titanic Dataset ⛵.
8.65 GEEK