What is Dimensionality Reduction?

In machine learning problems, there are often too many factors on the basis of which the final classification is done. These factors are basically variables called features. The higher the number of features, the harder it gets to visualize the training set and then work on it. The process of selecting a subset of features for use in model construction is called Dimensionality Reduction.

Before Learning the techniques of Dimensionality Reduction, lets understand why it is important to do Dimensionality Reduction in our Dataset.

Reasons :

**1) **The abundance of redundant and irrelevant features

2) With a fixed number of training samples, the predictive power reduces as the dimensionality increases. [Hughes phenomenon]

3) Other things being equal, simpler explanations are generally better than complex ones.

**4) **It improves the accuracy of a model if the right subset is chosen.

5) Reduces the Overfitting.

**6) **It reduces computation time.

7) It helps in data compression, and hence reduced storage space.

Dimensionality Reduction Techniques

  1. Percent missing values
  2. Amount of variation
  3. Multicollinearity
  4. Principal Component Analysis (PCA)
  5. Correlation (with the target)
  6. Forward selection
  7. Backward elimination
  8. LASSO

#data-analytics #machine-learning #data-science #statistical-analysis #features

Dimensionality Reduction Techniques in Machine Learning
1.55 GEEK