What and why behind fit_transform() vs transform() in scikit-learn !

What and why behind fit_transform() vs transform() in scikit-learn !

Scikit-learn is the most useful library for machine learning in Python programming language. It has a lot of tools to build a machine learning model and is quite easy to use too. Yet, we struggle at times to understand some of the very simple methods which we generally always use while building our machine learning model.

Scikit-learn is the most useful library for machine learning in Python programming language. It has a lot of tools to build a machine learning model and is quite easy to use too. Yet, we struggle at times to understand some of the very simple methods which we generally always use while building our machine learning model.

One such method is fit_transform() and another one is transform(). Both are the methods of class *sklearn.preprocessing.StandardScaler() *andused almost together while scaling or standardizing our training and test data.

Image for post

Photo by Tekton from Unspalsh

The motivation to write this blog came from multiple questions posted on these methods in an online course on Machine Learning.

The question is:

Why we use fit_transform() on training data but transform() on the test data?

We all know that we call fit_transform() method on our training data and transform() method on our test data. But the actual question is why do we do this? My motive is to explain this simple yet confusing point in the simplest possible manner. So let’s get started!

Suppose we are building a k-Nearest Neighbor model and we have to scale our features. The most common way to scale the features is through scikit-learn’s StandardScaler class.

Note:

  1. Data standardization is the process of rescaling the attributes so that they have mean as 0 and variance as 1.
  2. The ultimate goal to perform standardization is to bring down all the features to a common scale without distorting the differences in the range of the values.
  3. In sklearn.preprocessing.StandardScaler(), centering and scaling happens independently on each feature.

The magical formula which performs standardization:

Image for post

Let’s now deep dive into the concept.

fit_transform()

fit_transform() is used on the training data so that we can scale the training data and also learn the scaling parameters of that data. Here, the model built by us will learn the mean and variance of the features of the training set. These learned parameters are then used to scale our test data.

So what actually is happening here! 🤔

python scikit-learn data-science machine-lear

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Applied Data Analysis in Python Machine Learning and Data Science | Scikit-Learn

Applied Data Analysis in Python Machine learning and Data science, we will investigate the use of scikit-learn for machine learning to discover things about whatever data may come across your desk.

The Data Science & Machine Learning Bootcamp in Python

Learn Data Science, Machine Learning, and Deep Learning. In this article, I’ll show you how you stand to benefit by taking my data science course.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.