Any Machine Learning project journey starts with loading the dataset and ends (continues ?!) with the finalization of the optimum model or ensemble of models for predictions on unseen data and production deployment.

As machine learning practitioners, we are aware that there are several pit stops to be made along the way to arrive at the best possible prediction performance outcome. These intermediate steps include Exploratory Data Analysis (EDA), Data Preprocessing — missing value treatment, outlier treatment, changing data types, encoding categorical features, data transformation, feature engineering /selection, sampling, train-test split etc. to name a few — before we can embark on model building, evaluation and then prediction.

We end up importing dozens of python packages to help us do this and this means getting familiar with the syntax and parameters of multiple function calls within each of these packages.

Have you wished that there could be a single package that can handle the entire journey end to end with a consistent syntax interface? I sure have!

Enter PyCaret

These wishes were answered with PyCaretpackage and it is now even more awesome with the release of pycaret2.0.

Starting with this Article, I will post a series on how pycaret helps us zip through the various stages of an ML project.

Installation

Installation is a breeze and is over in a few minutes with all dependencies also being installed. It is recommended to install using a virtual environment like python3 virtualenv or conda environments to avoid any clash with other pre-installed packages.

pip install pycaret==2.0

Once installed, we are ready to begin! We import the package into our notebook environment. We will take up a classification problem here. Similarly, the respective PyCaret modules can be imported for a scenario involving regression, clustering, anomaly detection, NLP and Association rules mining.

Image for post

We will use the titanic dataset from kaggle.com. You can download the dataset from here.

Image for post

Let’s check the first few rows of the dataset using the head() function:

Image for post

Image for post

#machine-learning #python #pycaret #classification-algorithms #data-preprocessing

PyCaret: The Machine Learning Omnibus
2.60 GEEK