In the last post, we started working on the Titanic Kaggle competition. If you haven’t read that yet, you can read that here. So in this post, we will develop predictive models using Machine Learning.

If you have followed my last post then, now our data is ready to prepare the model. There are plenty of predictive algorithms out there to try. However, our problem is the classification problem thus I will try the following classification/ regression algorithms.

  • Support Vector Machine
  • K-Nearest Neighbour
  • Linear SVC
  • Decision Tree
  • Random Forest

First thing first.

Import all required machine learning libraries

To develop a machine learning model we need to import the Scikit-learn library.

_Scikit-Learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities. — _Scikit-Learn

Let’s import all the required algorithms from Scikit-Learn.

## machine learning
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

Each machine learning classification algorithms require train and test data to train and test model.

## dropping Name, Survived and PassengerId column
X_train = train_data.drop(["Name", "Survived", "PassengerId"], axis=1)
Y_train = train_data["Survived"]
X_test  = test_data.drop(['Name',"PassengerId"], axis=1).copy()
X_train.shape, Y_train.shape, X_test.shape

Support Vector Machine

_A support-vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks like outliers detection. — _Wikipedia

## Support Vector Machine
svc = SVC()
svc.fit(X_train, Y_train)
svm_Y_pred = svc.predict(X_test)
svc_accuracy = svc.score(X_train, Y_train)
svc_accuracy
## output
0.6823793490460157

K-Nearest Neighbor

_In K-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its knearest neighbors — _Wikipedia

## k-nearest neighbor
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X_train, Y_train)
knn_Y_pred = knn.predict(X_test)
knn_accuracy = knn.score(X_train, Y_train)
knn_accuracy
## output
0.8406285072951739

#kaggle #machine-learning #learning #python #artificial-intelligence #deep learning

Getting Started with Titanic Kaggle
1.20 GEEK