Feature Scaling and its importance in data Preprocessing Normalization vs Standardization.

Feature Scaling and its importance in data Preprocessing Normalization vs Standardization.

Feature scaling refers to the methods or techniques used to normalize the range of independent variables in our data, or in other words, the methods to set the feature value range within a similar scale. Feature scaling is generally the last step in the data preprocessing pipeline, performed just before training the machine learning algorithms.

Feature Scaling

Feature scaling refers to the methods or techniques used to normalize the range of independent variables in our data, or in other words, the methods to set the feature value range within a similar scale. Feature scaling is generally the last step in the data preprocessing pipeline, performed just before training the machine learning algorithms.

Feature magnitude matters because:

  • The regression coefficients of linear models are directly influenced by the scale of the variable.
  • Variables with bigger magnitude / larger value range dominate over those with smaller magnitude / value range.
  • Gradient descent converges faster when features are on similar scales.
  • Feature scaling helps decrease the time to find support vectors for SVMs
  • Euclidean distances are sensitive to feature magnitude.
  • Some algorithms, like PCA require the features to be centered at 0.

The machine learning models affected by the feature scaling are:

  • Linear and Logistic Regression
  • Neural Networks
  • Support Vector Machines
  • KNN
  • K-means clustering
  • Linear Discriminant Analysis (LDA)
  • Principal Component Analysis (PCA)

There are several Feature Scaling techniques like

  • Standardisation
  • Mean normalisation
  • Scaling to minimum and maximum values — MinMaxScaling
  • Scaling to maximum value — MaxAbsScaling
  • Scaling to quantiles and median — RobustScaling
  • Normalization to vector unit length

but here I will talk about the importance of Standardisation and Normalization.

Standardisation

Standardisation involves centering the variable at zero, and standardising the variance to 1. The procedure involves subtracting the mean of each observation and then dividing by the standard deviation:

z = (x — x_mean) / std

The result of the above transformation is z, which is called the z-score, and represents how many standard deviations a given observation deviates from the mean. A z-score specifies the location of the observation within a distribution (in numbers of standard deviations respect to the mean of the distribution). The sign of the z-score (+ or — ) indicates whether the observation is above (+) or below ( — ) the mean.

The shape of a standardised (or z-scored normalised) distribution will be identical to the original distribution of the variable. If the original distribution is normal, then the standardised distribution will be normal. But, if the original distribution is skewed, then the standardised distribution of the variable will also be skewed. In other words, standardising a variable does not normalize the distribution of the data.

In a nutshell, standardization:

  • centers the mean at 0
  • scales the variance at 1
  • preserves the shape of the original distribution
  • the minimum and maximum values of the different variables may vary
  • preserves outliers

Good for algorithms that require features centered at zero.

data-preprocessing data-science data-visualization machine-learning

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

How To Build A Data Science Career In 2021

In Conversation With Dr Suman Sanyal, NIIT University,he shares his insights on how universities can contribute to this highly promising sector and what aspirants can do to build a successful data science career.

Hire Machine Learning Engineer | Offshore Machine Learning Experts

We are a Machine Learning Services provider offering custom AI solutions, Machine Learning as a service & deep learning solutions. Hire Machine Learning experts & build AI Chatbots, Neural networks, etc. 16+ yrs & 2500+ clients.

The Difference between Data Science, Machine Learning and Big Data!

Many professionals and 'Data' enthusiasts often ask, “What's the difference between Data Science, Machine Learning and Big Data?”. Let's clear the air. If you are still wondering about it then this article is for you.

5 stages of learning Data Science

5 stages of learning Data Science and how to ace each of them