It’s possible that you will came across datasets with lots of numerical noise built in, such as lots of variance or differently-scaled data ,the preprocessing solution for that is standardization.

Standardization is a preprocessing method used to transform continuous data to make it look normally distributed, in scikit-learn this is often a necessary step, because many models assume that the data you are training on is normally distributed, and if it isn’t, **your risk biasing your model,** you can standardize your data in different ways, in this article, we’re going to talk about Two popular **data scaling** methods are **normalization** and **standardization**.

It’s also important to note that standardization is a preprocessing method applied to continuous, numerical data, there are a few different scenarios in which you want to standardize your data:

**-first**, if you are working with any kind of model that uses a linear distance metric or operates on a linear space like K-nearest neighbors, linear regression, or k-means clustering , the model is assuming that the data and features you’re giving it are related in a linear fashion, or can be measured with a linear distance metric.

**-second**, the case when a feature or features in your dataset have high variance is related to this, this could bias a model that assumes the data is normally distributed, if a feature in your dataset has a variance that’s an order of magnitude or more greater than other features, this could impact the model’s ability to learn from other features in the dataset.

#data-science #data-analysis #machine-learning #deep-learning #data-visualization

42.70 GEEK