Principal Component Analysis(PCA) is often used as a data mining technique to reduce the dimensionality of the data. In this post, I will show how you can perform PCA and plot its graphs using MATLAB.

What is PCA?

Principal Component Analysis(PCA) is a statistical method to reduce the dimensionality of the data. It assumes that data with large variation is important. PCA tries to find a unit vector(first principal component) that minimizes the average squared distance from the points to the line. Other components are lines perpendicular to this line.

Why do we need PCA?

Working with a large number of features is computationally expensive and the data generally has a small intrinsic dimension. To reduce the dimension of the data we will apply Principal Component Analysis(PCA) which ensures that no information is lost and checks if the data has a high standard deviation. Thus, PCA helps in fighting the curse of dimensionality and reduces the dimensionality to select just the top few features that satisfactorily represent the variation in data.

How PCA is done?

The method for PCA is as follows:

  • Normalize the values of the feature matrix using normalize function in MATLAB
  • Calculate the empirical mean along each column and use this mean to calculate the deviations from mean
  • Next, we use these deviations to calculate the p x pcovariance matrix.
  • Next, find the eigenvectors and eigenvalues of the covariance matrix
  • Sort the columns of the matrix in decreasing order of eigenvalues and compute the cumulative energy content for each eigenvector.
  • Finally select a subset of the eigenvectors as the basis vectors and project the z-score of the data on the basis vectors.

#matlab #eigenvectors #apc #data-mining #data analysis

Principal Component Analysis in MATLAB
1.95 GEEK