Preface

Cardiovascular diseases are diseases of the heart and blood vessels and they typically include heart attacks, strokes, and heart failures [1]. According to the World Health Organization (WHO), cardiovascular diseases like ischaemic heart disease and stroke have been the leading causes of deaths worldwide for the last decade and a half [2].


Motivation

A few months ago, a new heart failure dataset was uploaded on Kaggle. This dataset contained health records of 299 anonymized patients and had 12 clinical and lifestyle features. The task was to predict heart failure using these features.

Through this post, I aim to document my workflow on this task and present it as a research exercise. So this would naturally involve a bit of domain knowledge, references to journal papers, and deriving insights from them.

Warning: This post is nearly 10 minutes long and things may get a little dense as you scroll down, but I encourage you to give it a shot.


About the data

The dataset was originally released by Ahmed et al., in 2017 [3] as a supplement to their analysis of survival of heart failure patients at Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad, Pakistan. The dataset was subsequently accessed and analyzed by Chicco and Jurman in 2020 to predict heart failures using a bunch of machine learning techniques [4]. The dataset hosted on Kaggle cites these authors and their research paper.

The dataset primarily consists of clinical and lifestyle features of 105 female and 194 male heart failure patients. You can find each feature explained in the figure below.

Image for post

Fig. 1 — Clinical and lifestyle features of 299 patients in the dataset (credit: author)

Project Workflow

The workflow would be pretty straightforward —

  1. **Data Preprocessing — **Cleaning the data, imputing missing values, creating new features if needed, etc.
  2. **Exploratory Data Analysis — **This would involve summary statistics, plotting relationships, mapping trends, etc.
  3. **Model Building — **Building a baseline prediction model, followed by at least 2 classification models to train and test.

#heart-disease #data-science #machine-learning #exploratory-data-analysis #data-visualization

Predicting Heart Failure Survival with Machine Learning Models 
2.65 GEEK