In Machine Learning, a model will be as good (or as bad) as the data you train the model with. The magnitude of different features affects different machine learning models for various reasons.

For example, consider a data set containing two features, age, and income. Here age ranges from 0–100, while income ranges from 0 to a huge amount which is mostly higher than 100. Income is about 1,000 times larger than age. So, these two features are in very different ranges. When we do further analysis, like multivariate linear regression, for example, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean it is more important as a predictor. Therefore, the range of all features should be scaled so that each feature contributes approximately proportionately to the final distance.

For this exact purpose, using Feature Scaling is essential.

In this article, we will be discussing what, why of feature scaling, the techniques to achieve feature scaling, it’s usefulness, and python snippet to achieve feature scaling using these techniques.

The flow of discussion will be as follows,

  • Feature Scaling
  • Normalization
  • Standardization
  • Implementation
  • When to use what?

Feature scaling

Feature scaling is a technique to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

Why use Feature Scaling?

  1. Gradient descent converges much faster with feature scaling than without it.
  2. Many classifiers (like KNN, K-means) calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. So the range of features should be scaled so that each feature contributes approximately proportionately to the final distance.

However, every dataset does not require features scaling. It is required only when features have different ranges.

This can be achieved using two widely used techniques.

  1. Normalization
  2. Standardization

#feature-scaling #standardization #data-science #machine-learning #normalization

Normalization vs Standardization
3.45 GEEK