Scikit-Learn provides clean datasets for you to use when building ML models. And when I say clean, I mean the type of clean that’s ready to be used to train a ML model. The best part? The datasets come with the Scikit-Learn package itself. You don’t need to download anything. Within just a few lines of code, you’ll be working with the data.
Having ready-made datasets is a huge asset because you can get straight to creating models, not having to spend time obtaining, cleaning, and transforming the data — something data scientists spend lots of their time on.
Even with all the ground work complete, you might find using the Scikit-Learn datasets a bit confusing at first. Not to worry, in few minutes you’re going to know exactly how to use the datasets and be well on your way to exploring the world of Artificial Intelligence. This article assumes you have python, scikit-learn, pandas, and Jupyter Notebook (or you may use Google Collab) installed. Let’s begin.
Scikit-Learn provides seven datasets, which they call toy datasets. Don’t be fooled by the word “toy”. These datasets are powerful and serve as a strong starting point for learning ML. Here are few of the datasets and how ML can be used:
In this article, we’ll be working with the “Breast Cancer Wisconsin” dataset. We will import the data and understand how to read it. As a bonus, we’ll build a simple ML model that is able to classify cancer scans either as malignant or benign. To read more about the datasets, click here for Scikit-Learn’s documentation.
#machine-learning #data-science #python #scikit-learn #ai