Dimensionality Reduction with Principal Component Analysis

Introduction
With the availability of high-performance CPUs and GPUs, it is pretty much possible to solve every regression, classification, clustering, and other related problems using machine learning and deep learning models. However, there are still various portions that cause performance bottlenecks while developing such models. A large number of features in the dataset are one of the major factors that affect both the training time as well as the accuracy of machine learning models.
The Curse of Dimensionality
In machine learning, “dimensionality” simply refers to the number of features (i.e. input variables) in your dataset.
While the performance of any machine learning model increases if we add additional features/dimensions, at some point a further insertion leads to performance degradation that is when the number of features is very large commensurate with the number of observations in your dataset, several linear algorithms strive hard to train efficient models. This is called the “Curse of Dimensionality”.

Dimensionality reduction is a set of techniques that studies how to shrivel the size of data while preserving the most important information and further eliminating the curse of dimensionality. It plays an important role in the performance of classification and clustering problems.

#2020 may tutorials # overviews #dimensionality reduction #numpy #pca #python

kdnuggets.com

Dimensionality Reduction with Principal Component Analysis