Simply Explained — Principal Component Analysis

In this blog we will be looking at most important Dimensionality Reduction Technique i.e. Principal Component Analysis.

Source

Using PCA, we can find the correlation between data points, such as whether the Summer effect the sale of ice-cream or by how much. In PCA we will be generating co-variance matrix to check the correlation, but let’s start it from scratch.

As, we said earlier, PCA is Dimensionality Reduction technique, so first take a look how to reduce the dimensions.

But, Why we need to reduce dimensions?

PCA tries to remove the curse in any ML project, i.e. OVERFITTING. Overfitting is a problem generated when the model is too much accurate in training data, i.e. model perfectly fits all the points on training dataset. The reduce this overfitting, we generate the Principal Components.

Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables. These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components. So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put maximum possible information in the first component, then maximum remaining information in the second and so on.

Here, in below image, I have plot the best fit(overfitted) model with the data points, given two attributes X & Y. We will be generating the principal components by viewing the model from different directions.

Source: Author

PC1 — First Principal component(generated from view 1)

PC2 — Second Principal component(generated from view 2)

As you can se in the above image, that we tried to reduce 2-Dimension Model to 1-Dimension by generating it’s principal components as per different views. As per the note, the principal components generated should be less than or equal to the total attributes. Also remember, the components generated should hold orthogonal property i.e. each of the components should be independent to each other.

The main idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of many variables correlated with each other, either heavily or lightly, while retaining the variation present in the dataset, up to the maximum extent.

#python #principal-component #ai #machine-learning #data-science

But, Why we need to reduce dimensions?

medium.com

Simply Explained — Principal Component Analysis