Unsupervised machine learning has always been used to work very closely with supervised machine learning algorithms and one of the most popular unsupervised learning technique has been Principal Component analysis, why it is extremely popular is because of its ability to process our independent set of variables in such a manner where we end up with the set of variables which has more insightful information and very little noise.

Now with less noise and reduced dimensions, the data set becomes extremely lightweight, can be visualized better, and can be processed better by our ML models with very little overfitting. That’s why PCA is the darling of most of the data engineers who have this role of analyzing the data to reduce the cost of data processing by our machines in the cloud both in terms of speed and storage.

“Time saved is money saved for the industry , which PCA handles very diligently “

So with this wisdom at our disposal, it’s time to uncover this extremely powerful machine learning tool called PCA.

What Is PCA?

It is an unsupervised ML tool to reduce the dimensionality of the large data set having large numbers of independent variables with collinearity/correlation among themselves.

PCA in others terms is used for Dimensionality reduction by reducing noise in the given independent variables.

One has to understand how dimensionality reduction works before one can really assess how valuable PCA can be in the field of Unsupervised learning, so let’s get into the details of “Dimensionality Reduction “

What is Dimensionality Reduction & How Does It Work?

Dimensions here stand for all the column values present in our dataframe, and when it comes to reducing those columns we only use the independent features. So the technique of getting rid of those independent variables is called Dimensionality reduction.

How Dimensionality Reduction Is Achieved?

Dimensionality reduction is achieved using two of the below-given techniques

  • Feature Elimination
  • Feature Extraction

Feature Elimination :

It’s a simple but very harsh method of getting rid of those feature columns which doesn’t look important through the analysis.

Disadvantage:

  • The very obvious disadvantage of this methodology is that we will simply lose all the valuable information that a particular feature has to offer, which can be of prime importance. So in professional setup preferably this is the least used mechanism to reduce dimensions of the data set.

Advantage of Feature Elimination :

  • It is easy to interpret
  • It gives a high level of accuracy but at the cost of model overfitting

Feature Extraction :

In Feature extraction, intuition is to capture or extract meaningful information from the existing set of features and create a new set of feature column which ensure all the valuable info is retained and all the noises are eliminated.

Now that you understand the concept of dimensionality reduction, it’s time to understand the role of PCA . When it comes to extracting meaningful information from our feature variable, PCA is our way to go.

PCA is the tool to do feature extraction in careful and intelligent way

These extracted features are then generally used in our supervised or deep learning models to make the required predictions.

#science #business #data-science #machine-learning #technology #data analysis

Intuition Behind Principal Component Analysis
1.05 GEEK