Welcome back! In my previous post I wrote an EDA (Exploratory Data Analysis) on Titanic Survival dataset. Check it out now if you haven’t already. Anyway, in this article I would like to be more focusing on how to create a machine learning model which is able to predict whether a Titanic passenger survived based on their attributes i.e. gender, title, age and many more.

Welcome back! In my previous post I wrote an EDA (Exploratory Data Analysis) on Titanic Survival dataset. Check it out now if you haven’t already. Anyway, in this article I would like to be more focusing on how to create a machine learning model which is able to predict whether a Titanic passenger survived based on their attributes i.e. gender, title, age and many more.

Before going any further, I also want you to know that the project I do here is inspired by this article: https://towardsdatascience.com/kaggle-titanic-machine-learning-model-top-7-fa4523b7c40. I do implement several feature engineering techniques explained in that article with several modifications for the sake of simplicity. Now let’s do this :)

*Note: full code available at the end of this article*.

As always, the very first thing I do is importing all required modules and loading the dataset.

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
df = pd.read_csv('train.csv')
```

Now let’s start the feature engineering stuff from the *SibSp _and _Parch _columns. According to the dataset details (which you can access it from this link), the two columns represent the number of siblings/spouses and the number of parents/children abroad the Titanic respectively. The idea here is to create a new column called _FamilySize* in which the value is taken from the two columns I mentioned earlier.

sklearn logistic-regression classification machine-learning ai

Learning how to build a basic logistic regression model in machine learning using python . Logistic regression is a commonly used model in various industries such as banking, healthcare because when compared to other classification models, the logistic regression model is easily interpreted.

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Linear Regression VS Logistic Regression (MACHINE LEARNING). Linear Regression and Logistic Regression are two algorithms of machine learning and these are mostly used in the data science field.