A deep dive into the intuition behind PCA with the Math and Code fully covered. PCA is the simplest of the true eigenvector-based multivariate analyses.
_PCA is the simplest of the true eigenvector-based multivariate analyses. It is most commonly used as a dimensionality-reduction technique, reducing the dimensionality of large data-sets while still explaining most of the variance in the data. _Seems cute, doesn’t it?
With this article, I strive to make the idea of PCA intuitive. To understand this article, you should know- Elementary Linear Algebra and High-School Statistics. So, let’s get started.
The curse of dimensionality refers to the unfavorable consequences of dealing with multi-dimensional studies. Let’s take a simple example- consider having sampled N data points, where each data point is a d dimensional vector. Now, for the same N, data becomes sparse as we add dimensions. Think of having randomly distributed N points on a line vs N points on a plane, which of the two will have a higher density? The answer is quite intuitive, the line (refer to these figures).
Fine, as we add dimensions to our data, we make it sparse, but why is that a problem? If we lack enough density in our data, we can never be sure of our predictions. For us to train a model to predict results with decent accuracy, the data must be well represented or we run the risk of overfitting. Although we live in the realm of big data, a compromise in the density can only be tackled by an exponential increase in the data (N), which might not be available. Another problem with high-dimensional data is that we can’t easily visualize data beyond 3 dimensions. Distances lose meaning in higher dimensions.
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data.
Linear algebra is one of the most important topics in machine learning. In this article, I will introduce you to the basic concepts of linear algebra for machine learning using NumPy.
Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.
Applied Data Analysis in Python Machine learning and Data science, we will investigate the use of scikit-learn for machine learning to discover things about whatever data may come across your desk.
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant