It’s possible that you will come across datasets with lots of numerical noise built-in, such as variance or differently-scaled data, so a good preprocessing is a must before even thinking about machine learning. A good preprocessing solution for this type of problem is often referred to as standardization.
Photo by Fidel Fernando on Unsplash
Standardization is a preprocessing method used to transform continuous data to make it look normally distributed. In scikit-learn
this is often a necessary step because many models assume that the data you are training on is normally distributed, and if it isn’t, your risk biasing your model.
You can standardize your data in different ways, and in this article, we’re going to talk about the popular data scaling method — _data scaling. _Or standard scaling to be more precise.
It’s also important to note that standardization is a preprocessing method applied to continuous, numerical data, and there are a few different scenarios in which you want to use it:
Let’s now proceed with the data scaling.
#python #towards-data-science #machine-learning #artificial-intelligence #data-science #data analytic