A step-by-step pythonic walk-through of the analysis of survival data along with domain-level explanation. Cardiovascular diseases are diseases of the heart and blood vessels and they typically include heart attacks, strokes, and heart failures [1]. According to the World Health Organization (WHO), cardiovascular diseases like ischaemic heart disease and stroke have been the leading causes of deaths worldwide for the last decade and a half [2].
Cardiovascular diseases are diseases of the heart and blood vessels and they typically include heart attacks, strokes, and heart failures [1]. According to the World Health Organization (WHO), cardiovascular diseases like ischaemic heart disease and stroke have been the leading causes of deaths worldwide for the last decade and a half [2].
A few months ago, a new heart failure dataset was uploaded on Kaggle. This dataset contained health records of 299 anonymized patients and had 12 clinical and lifestyle features. The task was to predict heart failure using these features.
Through this post, I aim to document my workflow on this task and present it as a research exercise. So this would naturally involve a bit of domain knowledge, references to journal papers, and deriving insights from them.
Warning: This post is nearly 10 minutes long and things may get a little dense as you scroll down, but I encourage you to give it a shot.
The dataset was originally released by Ahmed et al., in 2017 [3] as a supplement to their analysis of survival of heart failure patients at Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad, Pakistan. The dataset was subsequently accessed and analyzed by Chicco and Jurman in 2020 to predict heart failures using a bunch of machine learning techniques [4]. The dataset hosted on Kaggle cites these authors and their research paper.
The dataset primarily consists of clinical and lifestyle features of 105 female and 194 male heart failure patients. You can find each feature explained in the figure below.
Fig. 1 — Clinical and lifestyle features of 299 patients in the dataset (credit: author)
The workflow would be pretty straightforward —
heart-disease data-science machine-learning exploratory-data-analysis data-visualization
You will discover Exploratory Data Analysis (EDA), the techniques and tactics that you can use, and why you should be performing EDA on your next problem.
Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.
Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.
Suppose you are looking to book a flight ticket for a trip of yours. Now, you will not go directly to a specific site and book the first ticket that you see.
A complete step-by-step exploratory data analysis with simple explanation. The dataset used in this project is UCI Heart Disease dataset, and both data and code for this project are available on my GitHub repository.