Start to finish on some open source data.

As a teacher, I’ve had to rely on video conferencing this last half year, but until now I have never bothered sharing any of my personal projects or explorations via video, until now. I have a YouTube playlist where I take you through various aspects of a simple machine learning project. In this case, it’s Early Stage Diabetes Prediction, a basic binary outcome supervised classification project that involves a modest number of mostly categorical features. In this post, I’ll give you a TLDR type overview as well as some details, and you can check out my videos or my public notebook to really delve into the code and the ideas.

DATA

The dataset is publicly available here, as part of the UCI Machine Learning Repository:

The **_paper _**is entitled Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques (Islam et al., 2019), and you can find it here, though unfortunately it is behind a paywall. I have it, and will share some of it, hopefully in a way that doesn’t get me in hot water.

My Google Colab Notebook with all the code is here, and should be publicly available to read:

My YouTube series on this project, so far, 8 videos, taking you through some of the code that you’ll find in the Colab notebook above. This is a living playlist, meaning that I might add some extra videos on this topic, especially if there are requests or audience feedback. The videos are labeled 1–8 and include the subtopic covered in the video as well.

#diabetes #data-science-courses #data-science #machine-learning #random-forest-classifiers

Machine Learning Mini-Project: Early Stage Diabetes Prediction
2.45 GEEK