Comparing AutoML/Non Auto-ML Multi-Classification Models

Comparing AutoML/Non Auto-ML Multi-Classification Models

In this post we will show how to use sdk to create a multi-classification use case using white wine quality data from the UCI Machine Learning Repository.


In this post we will show how to use sdk to create a multi-classification use case using white wine quality data from the UCI Machine Learning Repository.

The machine learning objective is to predict white wine quality from its chemical characteristics such as (acidity, ph, density, sulphates ..)

Furthermore we will compare prevision performances with other self made coding algorithms, and show how we can compare both approachs within exactly the same scope (same cross validation/ test evaluation) despite the black box characteristic of the auto-ml solution offered by prevision platform.

Checkout my previous post to see how to install the sdk, and if you want to test it you have free trial access on the public cloud instance

Auto-ML approach:

Let’s get the dataset:

import pandas as pd
df = pd.read_csv('winequality-white.csv', sep=';')

Create train / test subsets

Lets create a sub-sample (about 20% of the overall dataset) that we will use as a holdout data-set, in order to evaluate the generalization error of our models. this sub-sample will be put aside, and not used for training. Hence, we will find out how well our models will perform on new data (not seen during the training phase).

It exist many ways to create the sample, the simplest is to use train_test_split() of scikit learn

from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)

During Feature engineering step you can construct two types of features:

  • *Business derived features *: example here if we have some knowledge in chemistry in chemistry we can extract new feature from a combination of fixed acidity, volatile acidity and citric acid to create a new explicative feature
  • Transformation based features : these features are derived from ML operations such as scaling, encoding, normalization, ACP components… to create new features more valuable to the models.

The second kind of feature engineering is supported by the platform: once you launch the use case on your dataset, you can select transformations that you want to apply on your dataset, and they will be automatically computed and added as stand-alone features to get more information about the feature-transformation supported by the platform consult this link

previsionio scikit-learn machine-learning data-science automl

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Getting Started with scikit-learn Pipelines for Machine Learning

Getting Started with scikit-learn Pipelines for Machine Learning: Building a pipeline from the ground up. (All code in this post is also included in this GitHub repository.)

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

AutoML: Automated Machine Learning | Data Science | Machine Learning | Python

AutoML makes the power of a Machine Learning algorithm available to you even if you don't have the complete knowledge of Machine Learning.You can use AutoML

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

Best Free Datasets for Data Science and Machine Learning Projects

This post will help you in finding different websites where you can easily get free Datasets to practice and develop projects in Data Science and Machine Learning.