# The Essence of Principal Component Analysis (PCA) A deep dive into the intuition behind PCA with the Math and Code fully covered. PCA is the simplest of the true eigenvector-based multivariate analyses.

## Overview :

_PCA is the simplest of the true eigenvector-based multivariate analyses. It is most commonly used as a dimensionality-reduction technique, reducing the dimensionality of large data-sets while still explaining most of the variance in the data. _Seems cute, doesn’t it?

With this article, I strive to make the idea of PCA intuitive. To understand this article, you should know- Elementary Linear Algebra and High-School Statistics. So, let’s get started.

## The Curse of Dimensionality :

The curse of dimensionality refers to the unfavorable consequences of dealing with multi-dimensional studies. Let’s take a simple example- consider having sampled N data points, where each data point is a d dimensional vector. Now, for the same N, data becomes sparse as we add dimensions. Think of having randomly distributed N points on a line vs N points on a plane, which of the two will have a higher density? The answer is quite intuitive, the line (refer to these figures).

Fine, as we add dimensions to our data, we make it sparse, but why is that a problem? If we lack enough density in our data, we can never be sure of our predictions. For us to train a model to predict results with decent accuracy, the data must be well represented or we run the risk of overfitting. Although we live in the realm of big data, a compromise in the density can only be tackled by an exponential increase in the data (N), which might not be available. Another problem with high-dimensional data is that we can’t easily visualize data beyond 3 dimensions. Distances lose meaning in higher dimensions.

## Dimensionality Reduction in Machine Learning

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data.

## Linear Algebra for Machine Learning | Data Science | Machine Learning

Linear algebra is one of the most important topics in machine learning. In this article, I will introduce you to the basic concepts of linear algebra for machine learning using NumPy.

## 15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

## Applied Data Analysis in Python Machine Learning and Data Science | Scikit-Learn

Applied Data Analysis in Python Machine learning and Data science, we will investigate the use of scikit-learn for machine learning to discover things about whatever data may come across your desk.

## Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant