Scikit-learn is the most useful library for machine learning in Python programming language. It has a lot of tools to build a machine learning model and is quite easy to use too. Yet, we struggle at times to understand some of the very simple methods which we generally always use while building our machine learning model.
Scikit-learn is the most useful library for machine learning in Python programming language. It has a lot of tools to build a machine learning model and is quite easy to use too. Yet, we struggle at times to understand some of the very simple methods which we generally always use while building our machine learning model.
One such method is fit_transform() and another one is transform(). Both are the methods of class *sklearn.preprocessing.StandardScaler() *andused almost together while scaling or standardizing our training and test data.
Photo by Tekton from Unspalsh
The motivation to write this blog came from multiple questions posted on these methods in an online course on Machine Learning.
The question is:
Why we use fit_transform() on training data but transform() on the test data?
We all know that we call fit_transform() method on our training data and transform() method on our test data. But the actual question is why do we do this? My motive is to explain this simple yet confusing point in the simplest possible manner. So let’s get started!
Suppose we are building a k-Nearest Neighbor model and we have to scale our features. The most common way to scale the features is through scikit-learn’s StandardScaler class.
Note:
The magical formula which performs standardization:
Let’s now deep dive into the concept.
fit_transform()
fit_transform() is used on the training data so that we can scale the training data and also learn the scaling parameters of that data. Here, the model built by us will learn the mean and variance of the features of the training set. These learned parameters are then used to scale our test data.
So what actually is happening here! 🤔
Applied Data Analysis in Python Machine learning and Data science, we will investigate the use of scikit-learn for machine learning to discover things about whatever data may come across your desk.
Learn Data Science, Machine Learning, and Deep Learning. In this article, I’ll show you how you stand to benefit by taking my data science course.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.